init - 初始化项目

This commit is contained in:
Lee Nony
2022-05-06 01:58:53 +08:00
commit 90a5cc7cb6
6772 changed files with 2837787 additions and 0 deletions

View File

@@ -0,0 +1,54 @@
# How to run custom OCR model {#tutorial_dnn_OCR}
@tableofcontents
@prev_tutorial{tutorial_dnn_custom_layers}
@next_tutorial{tutorial_dnn_text_spotting}
| | |
| -: | :- |
| Original author | Zihao Mu |
| Compatibility | OpenCV >= 4.3 |
## Introduction
In this tutorial, we first introduce how to obtain the custom OCR model, then how to transform your own OCR models so that they can be run correctly by the opencv_dnn module. and finally we will provide some pre-trained models.
## Train your own OCR model
[This repository](https://github.com/zihaomu/deep-text-recognition-benchmark) is a good start point for training your own OCR model. In repository, the MJSynth+SynthText was set as training set by default. In addition, you can configure the model structure and data set you want.
## Transform OCR model to ONNX format and Use it in OpenCV DNN
After completing the model training, please use [transform_to_onnx.py](https://github.com/zihaomu/deep-text-recognition-benchmark/blob/master/transform_to_onnx.py) to convert the model into onnx format.
#### Execute in webcam
The Python version example code can be found at [here](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.py).
Example:
@code{.bash}
$ text_detection -m=[path_to_text_detect_model] -ocr=[path_to_text_recognition_model]
@endcode
## Pre-trained ONNX models are provided
Some pre-trained models can be found at https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing.
Their performance at different text recognition datasets is shown in the table below:
| Model name | IIIT5k(%) | SVT(%) | ICDAR03(%) | ICDAR13(%) | ICDAR15(%) | SVTP(%) | CUTE80(%) | average acc (%) | parameter( x10^6 ) |
| -------------------- | --------- | ------ | ---------- | ---------- | ---------- | ------- | --------- | --------------- | ------------------ |
| DenseNet-CTC | 72.267 | 67.39 | 82.81 | 80 | 48.38 | 49.45 | 42.50 | 63.26 | 0.24 |
| DenseNet-BiLSTM-CTC | 73.76 | 72.33 | 86.15 | 83.15 | 50.67 | 57.984 | 49.826 | 67.69 | 3.63 |
| VGG-CTC | 75.96 | 75.42 | 85.92 | 83.54 | 54.89 | 57.52 | 50.17 | 69.06 | 5.57 |
| CRNN_VGG-BiLSTM-CTC | 82.63 | 82.07 | 92.96 | 88.867 | 66.28 | 71.01 | 62.37 | 78.03 | 8.45 |
| ResNet-CTC | 84.00 | 84.08 | 92.39 | 88.96 | 67.74 | 74.73 | 67.60 | 79.93 | 44.28 |
The performance of the text recognition model were tesred on OpenCV DNN, and does not include the text detection model.
#### Model selection suggestion:
The input of text recognition model is the output of the text detection model, which causes the performance of text detection to greatly affect the performance of text recognition.
DenseNet_CTC has the smallest parameters and best FPS, and it is suitable for edge devices, which are very sensitive to the cost of calculation. If you have limited computing resources and want to achieve better accuracy, VGG_CTC is a good choice.
CRNN_VGG_BiLSTM_CTC is suitable for scenarios that require high recognition accuracy.

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

View File

@@ -0,0 +1,107 @@
# How to run deep networks on Android device {#tutorial_dnn_android}
@tableofcontents
@prev_tutorial{tutorial_dnn_halide_scheduling}
@next_tutorial{tutorial_dnn_yolo}
| | |
| -: | :- |
| Original author | Dmitry Kurtaev |
| Compatibility | OpenCV >= 3.3 |
## Introduction
In this tutorial you'll know how to run deep learning networks on Android device
using OpenCV deep learning module.
Tutorial was written for the following versions of corresponding software:
- Android Studio 2.3.3
- OpenCV 3.3.0+
## Requirements
- Download and install Android Studio from https://developer.android.com/studio.
- Get the latest pre-built OpenCV for Android release from https://github.com/opencv/opencv/releases and unpack it (for example, `opencv-4.X.Y-android-sdk.zip`).
- Download MobileNet object detection model from https://github.com/chuanqi305/MobileNet-SSD. We need a configuration file `MobileNetSSD_deploy.prototxt` and weights `MobileNetSSD_deploy.caffemodel`.
## Create an empty Android Studio project
- Open Android Studio. Start a new project. Let's call it `opencv_mobilenet`.
![](1_start_new_project.png)
- Keep default target settings.
![](2_start_new_project.png)
- Use "Empty Activity" template. Name activity as `MainActivity` with a
corresponding layout `activity_main`.
![](3_start_new_project.png)
![](4_start_new_project.png)
- Wait until a project was created. Go to `Run->Edit Configurations`.
Choose `USB Device` as target device for runs.
![](5_setup.png)
Plug in your device and run the project. It should be installed and launched
successfully before we'll go next.
@note Read @ref tutorial_android_dev_intro in case of problems.
![](6_run_empty_project.png)
## Add OpenCV dependency
- Go to `File->New->Import module` and provide a path to `unpacked_OpenCV_package/sdk/java`. The name of module detects automatically.
Disable all features that Android Studio will suggest you on the next window.
![](7_import_module.png)
![](8_import_module.png)
- Open two files:
1. `AndroidStudioProjects/opencv_mobilenet/app/build.gradle`
2. `AndroidStudioProjects/opencv_mobilenet/openCVLibrary330/build.gradle`
Copy both `compileSdkVersion` and `buildToolsVersion` from the first file to
the second one.
`compileSdkVersion 14` -> `compileSdkVersion 26`
`buildToolsVersion "25.0.0"` -> `buildToolsVersion "26.0.1"`
- Make the project. There is no errors should be at this point.
- Go to `File->Project Structure`. Add OpenCV module dependency.
![](9_opencv_dependency.png)
![](10_opencv_dependency.png)
- Install once an appropriate OpenCV manager from `unpacked_OpenCV_package/apk`
to target device.
@code
adb install OpenCV_3.3.0_Manager_3.30_armeabi-v7a.apk
@endcode
- Congratulations! We're ready now to make a sample using OpenCV.
## Make a sample
Our sample will takes pictures from a camera, forwards it into a deep network and
receives a set of rectangles, class identifiers and confidence values in `[0, 1]`
range.
- First of all, we need to add a necessary widget which displays processed
frames. Modify `app/src/main/res/layout/activity_main.xml`:
@include android/mobilenet-objdetect/res/layout/activity_main.xml
- Put downloaded `MobileNetSSD_deploy.prototxt` and `MobileNetSSD_deploy.caffemodel`
into `app/build/intermediates/assets/debug` folder.
- Modify `/app/src/main/AndroidManifest.xml` to enable full-screen mode, set up
a correct screen orientation and allow to use a camera.
@include android/mobilenet-objdetect/gradle/AndroidManifest.xml
- Replace content of `app/src/main/java/org/opencv/samples/opencv_mobilenet/MainActivity.java`:
@include android/mobilenet-objdetect/src/org/opencv/samples/opencv_mobilenet/MainActivity.java
- Launch an application and make a fun!
![](11_demo.jpg)

View File

@@ -0,0 +1,236 @@
# Custom deep learning layers support {#tutorial_dnn_custom_layers}
@tableofcontents
@prev_tutorial{tutorial_dnn_javascript}
@next_tutorial{tutorial_dnn_OCR}
| | |
| -: | :- |
| Original author | Dmitry Kurtaev |
| Compatibility | OpenCV >= 3.4.1 |
## Introduction
Deep learning is a fast growing area. The new approaches to build neural networks
usually introduce new types of layers. They could be modifications of existing
ones or implement outstanding researching ideas.
OpenCV gives an opportunity to import and run networks from different deep learning
frameworks. There are a number of the most popular layers. However you can face
a problem that your network cannot be imported using OpenCV because of unimplemented layers.
The first solution is to create a feature request at https://github.com/opencv/opencv/issues
mentioning details such a source of model and type of new layer. A new layer could
be implemented if OpenCV community shares this need.
The second way is to define a **custom layer** so OpenCV's deep learning engine
will know how to use it. This tutorial is dedicated to show you a process of deep
learning models import customization.
## Define a custom layer in C++
Deep learning layer is a building block of network's pipeline.
It has connections to **input blobs** and produces results to **output blobs**.
There are trained **weights** and **hyper-parameters**.
Layers' names, types, weights and hyper-parameters are stored in files are generated by
native frameworks during training. If OpenCV mets unknown layer type it throws an
exception trying to read a model:
```
Unspecified error: Can't create layer "layer_name" of type "MyType" in function getLayerInstance
```
To import the model correctly you have to derive a class from cv::dnn::Layer with
the following methods:
@snippet dnn/custom_layers.hpp A custom layer interface
And register it before the import:
@snippet dnn/custom_layers.hpp Register a custom layer
@note `MyType` is a type of unimplemented layer from the thrown exception.
Let's see what all the methods do:
- Constructor
@snippet dnn/custom_layers.hpp MyLayer::MyLayer
Retrieves hyper-parameters from cv::dnn::LayerParams. If your layer has trainable
weights they will be already stored in the Layer's member cv::dnn::Layer::blobs.
- A static method `create`
@snippet dnn/custom_layers.hpp MyLayer::create
This method should create an instance of you layer and return cv::Ptr with it.
- Output blobs' shape computation
@snippet dnn/custom_layers.hpp MyLayer::getMemoryShapes
Returns layer's output shapes depends on input shapes. You may request an extra
memory using `internals`.
- Run a layer
@snippet dnn/custom_layers.hpp MyLayer::forward
Implement a layer's logic here. Compute outputs for given inputs.
@note OpenCV manages memory allocated for layers. In the most cases the same memory
can be reused between layers. So your `forward` implementation should not rely that
the second invocation of `forward` will has the same data at `outputs` and `internals`.
- Optional `finalize` method
@snippet dnn/custom_layers.hpp MyLayer::finalize
The chain of methods are the following: OpenCV deep learning engine calls `create`
method once then it calls `getMemoryShapes` for an every created layer then you
can make some preparations depends on known input dimensions at cv::dnn::Layer::finalize.
After network was initialized only `forward` method is called for an every network's input.
@note Varying input blobs' sizes such height or width or batch size you make OpenCV
reallocate all the internal memory. That leads efficiency gaps. Try to initialize
and deploy models using a fixed batch size and image's dimensions.
## Example: custom layer from Caffe
Let's create a custom layer `Interp` from https://github.com/cdmh/deeplab-public.
It's just a simple resize that takes an input blob of size `N x C x Hi x Wi` and returns
an output blob of size `N x C x Ho x Wo` where `N` is a batch size, `C` is a number of channels,
`Hi x Wi` and `Ho x Wo` are input and output `height x width` correspondingly.
This layer has no trainable weights but it has hyper-parameters to specify an output size.
In example,
~~~~~~~~~~~~~
layer {
name: "output"
type: "Interp"
bottom: "input"
top: "output"
interp_param {
height: 9
width: 8
}
}
~~~~~~~~~~~~~
This way our implementation can look like:
@snippet dnn/custom_layers.hpp InterpLayer
Next we need to register a new layer type and try to import the model.
@snippet dnn/custom_layers.hpp Register InterpLayer
## Example: custom layer from TensorFlow
This is an example of how to import a network with [tf.image.resize_bilinear](https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_bilinear)
operation. This is also a resize but with an implementation different from OpenCV's or `Interp` above.
Let's create a single layer network:
~~~~~~~~~~~~~{.py}
inp = tf.placeholder(tf.float32, [2, 3, 4, 5], 'input')
resized = tf.image.resize_bilinear(inp, size=[9, 8], name='resize_bilinear')
~~~~~~~~~~~~~
OpenCV sees that TensorFlow's graph in the following way:
```
node {
name: "input"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
}
node {
name: "resize_bilinear/size"
op: "Const"
attr {
key: "dtype"
value {
type: DT_INT32
}
}
attr {
key: "value"
value {
tensor {
dtype: DT_INT32
tensor_shape {
dim {
size: 2
}
}
tensor_content: "\t\000\000\000\010\000\000\000"
}
}
}
}
node {
name: "resize_bilinear"
op: "ResizeBilinear"
input: "input:0"
input: "resize_bilinear/size"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
attr {
key: "align_corners"
value {
b: false
}
}
}
library {
}
```
Custom layers import from TensorFlow is designed to put all layer's `attr` into
cv::dnn::LayerParams but input `Const` blobs into cv::dnn::Layer::blobs.
In our case resize's output shape will be stored in layer's `blobs[0]`.
@snippet dnn/custom_layers.hpp ResizeBilinearLayer
Next we register a layer and try to import the model.
@snippet dnn/custom_layers.hpp Register ResizeBilinearLayer
## Define a custom layer in Python
The following example shows how to customize OpenCV's layers in Python.
Let's consider [Holistically-Nested Edge Detection](https://arxiv.org/abs/1504.06375)
deep learning model. That was trained with one and only difference comparing to
a current version of [Caffe framework](http://caffe.berkeleyvision.org/). `Crop`
layers that receive two input blobs and crop the first one to match spatial dimensions
of the second one used to crop from the center. Nowadays Caffe's layer does it
from the top-left corner. So using the latest version of Caffe or OpenCV you'll
get shifted results with filled borders.
Next we're going to replace OpenCV's `Crop` layer that makes top-left cropping by
a centric one.
- Create a class with `getMemoryShapes` and `forward` methods
@snippet dnn/edge_detection.py CropLayer
@note Both methods should return lists.
- Register a new layer.
@snippet dnn/edge_detection.py Register
That's it! We've replaced an implemented OpenCV's layer to a custom one.
You may find a full script in the [source code](https://github.com/opencv/opencv/tree/master/samples/dnn/edge_detection.py).
<table border="0">
<tr>
<td>![](js_tutorials/js_assets/lena.jpg)</td>
<td>![](images/lena_hed.jpg)</td>
</tr>
</table>

View File

@@ -0,0 +1,74 @@
Load Caffe framework models {#tutorial_dnn_googlenet}
===========================
@tableofcontents
@next_tutorial{tutorial_dnn_halide}
| | |
| -: | :- |
| Original author | Vitaliy Lyudvichenko |
| Compatibility | OpenCV >= 3.3 |
Introduction
------------
In this tutorial you will learn how to use opencv_dnn module for image classification by using
GoogLeNet trained network from [Caffe model zoo](http://caffe.berkeleyvision.org/model_zoo.html).
We will demonstrate results of this example on the following picture.
![Buran space shuttle](images/space_shuttle.jpg)
Source Code
-----------
We will be using snippets from the example application, that can be downloaded [here](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.cpp).
@include dnn/classification.cpp
Explanation
-----------
-# Firstly, download GoogLeNet model files:
[bvlc_googlenet.prototxt ](https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/bvlc_googlenet.prototxt) and
[bvlc_googlenet.caffemodel](http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel)
Also you need file with names of [ILSVRC2012](http://image-net.org/challenges/LSVRC/2012/browse-synsets) classes:
[classification_classes_ILSVRC2012.txt](https://github.com/opencv/opencv/blob/master/samples/data/dnn/classification_classes_ILSVRC2012.txt).
Put these files into working dir of this program example.
-# Read and initialize network using path to .prototxt and .caffemodel files
@snippet dnn/classification.cpp Read and initialize network
You can skip an argument `framework` if one of the files `model` or `config` has an
extension `.caffemodel` or `.prototxt`.
This way function cv::dnn::readNet can automatically detects a model's format.
-# Read input image and convert to the blob, acceptable by GoogleNet
@snippet dnn/classification.cpp Open a video file or an image file or a camera stream
cv::VideoCapture can load both images and videos.
@snippet dnn/classification.cpp Create a 4D blob from a frame
We convert the image to a 4-dimensional blob (so-called batch) with `1x3x224x224` shape
after applying necessary pre-processing like resizing and mean subtraction
`(-104, -117, -123)` for each blue, green and red channels correspondingly using cv::dnn::blobFromImage function.
-# Pass the blob to the network
@snippet dnn/classification.cpp Set input blob
-# Make forward pass
@snippet dnn/classification.cpp Make forward pass
During the forward pass output of each network layer is computed, but in this example we need output from the last layer only.
-# Determine the best class
@snippet dnn/classification.cpp Get a class with a highest score
We put the output of network, which contain probabilities for each of 1000 ILSVRC2012 image classes, to the `prob` blob.
And find the index of element with maximal value in this one. This index corresponds to the class of the image.
-# Run an example from command line
@code
./example_dnn_classification --model=bvlc_googlenet.caffemodel --config=bvlc_googlenet.prototxt --width=224 --height=224 --classes=classification_classes_ILSVRC2012.txt --input=space_shuttle.jpg --mean="104 117 123"
@endcode
For our image we get prediction of class `space shuttle` with more than 99% sureness.

View File

@@ -0,0 +1,88 @@
# How to enable Halide backend for improve efficiency {#tutorial_dnn_halide}
@tableofcontents
@prev_tutorial{tutorial_dnn_googlenet}
@next_tutorial{tutorial_dnn_halide_scheduling}
| | |
| -: | :- |
| Original author | Dmitry Kurtaev |
| Compatibility | OpenCV >= 3.3 |
## Introduction
This tutorial guidelines how to run your models in OpenCV deep learning module
using Halide language backend. Halide is an open-source project that let us
write image processing algorithms in well-readable format, schedule computations
according to specific device and evaluate it with a quite good efficiency.
An official website of the Halide project: http://halide-lang.org/.
An up to date efficiency comparison: https://github.com/opencv/opencv/wiki/DNN-Efficiency
## Requirements
### LLVM compiler
@note LLVM compilation might take a long time.
- Download LLVM source code from http://releases.llvm.org/4.0.0/llvm-4.0.0.src.tar.xz.
Unpack it. Let **llvm_root** is a root directory of source code.
- Create directory **llvm_root**/tools/clang
- Download Clang with the same version as LLVM. In our case it will be from
http://releases.llvm.org/4.0.0/cfe-4.0.0.src.tar.xz. Unpack it into
**llvm_root**/tools/clang. Note that it should be a root for Clang source code.
- Build LLVM on Linux
@code
cd llvm_root
mkdir build && cd build
cmake -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release ..
make -j4
@endcode
- Build LLVM on Windows (Developer Command Prompt)
@code
mkdir \\path-to-llvm-build\\ && cd \\path-to-llvm-build\\
cmake.exe -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=\\path-to-llvm-install\\ -G "Visual Studio 14 Win64" \\path-to-llvm-src\\
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\INSTALL.vcxproj
@endcode
@note `\\path-to-llvm-build\\` and `\\path-to-llvm-install\\` are different directories.
### Halide language.
- Download source code from GitHub repository, https://github.com/halide/Halide
or using git. The root directory will be a **halide_root**.
@code
git clone https://github.com/halide/Halide.git
@endcode
- Build Halide on Linux
@code
cd halide_root
mkdir build && cd build
cmake -DLLVM_DIR=llvm_root/build/lib/cmake/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF ..
make -j4
@endcode
- Build Halide on Windows (Developer Command Prompt)
@code
cd halide_root
mkdir build && cd build
cmake.exe -DLLVM_DIR=\\path-to-llvm-install\\lib\\cmake\\llvm -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 14 Win64" ..
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\ALL_BUILD.vcxproj
@endcode
## Build OpenCV with Halide backend
When you build OpenCV add the following configuration flags:
- `WITH_HALIDE` - enable Halide linkage
- `HALIDE_ROOT_DIR` - path to Halide build directory
## Set Halide as a preferable backend
@code
net.setPreferableBackend(DNN_BACKEND_HALIDE);
@endcode

View File

@@ -0,0 +1,92 @@
# How to schedule your network for Halide backend {#tutorial_dnn_halide_scheduling}
@tableofcontents
@prev_tutorial{tutorial_dnn_halide}
@next_tutorial{tutorial_dnn_android}
| | |
| -: | :- |
| Original author | Dmitry Kurtaev |
| Compatibility | OpenCV >= 3.3 |
## Introduction
Halide code is the same for every device we use. But for achieving the satisfied
efficiency we should schedule computations properly. In this tutorial we describe
the ways to schedule your networks using Halide backend in OpenCV deep learning module.
For better understanding of Halide scheduling you might want to read tutorials @ http://halide-lang.org/tutorials.
If it's your first meeting with Halide in OpenCV, we recommend to start from @ref tutorial_dnn_halide.
## Configuration files
You can schedule computations of Halide pipeline by writing textual configuration files.
It means that you can easily vectorize, parallelize and manage loops order of
layers computation. Pass path to file with scheduling directives for specific
device into ```cv::dnn::Net::setHalideScheduler``` before the first ```cv::dnn::Net::forward``` call.
Scheduling configuration files represented as YAML files where each node is a
scheduled function or a scheduling directive.
@code
relu1:
reorder: [x, c, y]
split: { y: 2, c: 8 }
parallel: [yo, co]
unroll: yi
vectorize: { x: 4 }
conv1_constant_exterior:
compute_at: { relu1: yi }
@endcode
Considered use variables `n` for batch dimension, `c` for channels,
`y` for rows and `x` for columns. For variables after split are used names
with the same prefix but `o` and `i` suffixes for outer and inner variables
correspondingly. In example, for variable `x` in range `[0, 10)` directive
`split: { x: 2 }` gives new ones `xo` in range `[0, 5)` and `xi` in range `[0, 2)`.
Variable name `x` is no longer available in the same scheduling node.
You can find scheduling examples at [opencv_extra/testdata/dnn](https://github.com/opencv/opencv_extra/tree/master/testdata/dnn)
and use it for schedule your networks.
## Layers fusing
Thanks to layers fusing we can schedule only the top layers of fused sets.
Because for every output value we use the fused formula.
In example, if you have three layers Convolution + Scale + ReLU one by one,
@code
conv(x, y, c, n) = sum(...) + bias(c);
scale(x, y, c, n) = conv(x, y, c, n) * weights(c);
relu(x, y, c, n) = max(scale(x, y, c, n), 0);
@endcode
fused function is something like
@code
relu(x, y, c, n) = max((sum(...) + bias(c)) * weights(c), 0);
@endcode
So only function called `relu` require scheduling.
## Scheduling patterns
Sometimes networks built using blocked structure that means some layer are
identical or quite similar. If you want to apply the same scheduling for
different layers accurate to tiling or vectorization factors, define scheduling
patterns in section `patterns` at the beginning of scheduling file.
Also, your patters may use some parametric variables.
@code
# At the beginning of the file
patterns:
fully_connected:
split: { c: c_split }
fuse: { src: [x, y, co], dst: block }
parallel: block
vectorize: { ci: c_split }
# Somewhere below
fc8:
pattern: fully_connected
params: { c_split: 8 }
@endcode
## Automatic scheduling
You can let DNN to schedule layers automatically. Just skip call of ```cv::dnn::Net::setHalideScheduler```. Sometimes it might be even more efficient than manual scheduling.
But if specific layers require be scheduled manually, you would be able to
mix both manual and automatic scheduling ways. Write scheduling file
and skip layers that you want to be scheduled automatically.

View File

@@ -0,0 +1,54 @@
# How to run deep networks in browser {#tutorial_dnn_javascript}
@tableofcontents
@prev_tutorial{tutorial_dnn_yolo}
@next_tutorial{tutorial_dnn_custom_layers}
| | |
| -: | :- |
| Original author | Dmitry Kurtaev |
| Compatibility | OpenCV >= 3.3.1 |
## Introduction
This tutorial will show us how to run deep learning models using OpenCV.js right
in a browser. Tutorial refers a sample of face detection and face recognition
models pipeline.
## Face detection
Face detection network gets BGR image as input and produces set of bounding boxes
that might contain faces. All that we need is just select the boxes with a strong
confidence.
## Face recognition
Network is called OpenFace (project https://github.com/cmusatyalab/openface).
Face recognition model receives RGB face image of size `96x96`. Then it returns
`128`-dimensional unit vector that represents input face as a point on the unit
multidimensional sphere. So difference between two faces is an angle between two
output vectors.
## Sample
All the sample is an HTML page that has JavaScript code to use OpenCV.js functionality.
You may see an insertion of this page below. Press `Start` button to begin a demo.
Press `Add a person` to name a person that is recognized as an unknown one.
Next we'll discuss main parts of the code.
@htmlinclude js_face_recognition.html
-# Run face detection network to detect faces on input image.
@snippet dnn/js_face_recognition.html Run face detection model
You may play with input blob sizes to balance detection quality and efficiency.
The bigger input blob the smaller faces may be detected.
-# Run face recognition network to receive `128`-dimensional unit feature vector by input face image.
@snippet dnn/js_face_recognition.html Get 128 floating points feature vector
-# Perform a recognition.
@snippet dnn/js_face_recognition.html Recognize
Match a new feature vector with registered ones. Return a name of the best matched person.
-# The main loop.
@snippet dnn/js_face_recognition.html Define frames processing
A main loop of our application receives a frames from a camera and makes a recognition
of an every detected face on the frame. We start this function ones when OpenCV.js was
initialized and deep learning models were downloaded.

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

View File

@@ -0,0 +1,220 @@
# Conversion of PyTorch Classification Models and Launch with OpenCV C++ {#pytorch_cls_c_tutorial_dnn_conversion}
@prev_tutorial{pytorch_cls_tutorial_dnn_conversion}
| | |
| -: | :- |
| Original author | Anastasia Murzova |
| Compatibility | OpenCV >= 4.5 |
## Goals
In this tutorial you will learn how to:
* convert PyTorch classification models into ONNX format
* run converted PyTorch model with OpenCV C/C++ API
* provide model inference
We will explore the above-listed points by the example of ResNet-50 architecture.
## Introduction
Let's briefly view the key concepts involved in the pipeline of PyTorch models transition with OpenCV API. The initial step in conversion of PyTorch models into cv::dnn::Net
is model transferring into [ONNX](https://onnx.ai/about.html) format. ONNX aims at the interchangeability of the neural networks between various frameworks. There is a built-in function in PyTorch for ONNX conversion: [``torch.onnx.export``](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export).
Further the obtained ``.onnx`` model is passed into cv::dnn::readNetFromONNX or cv::dnn::readNet.
## Requirements
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
```console
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
The below line initiates requirements installation into the previously activated virtual environment:
```console
pip install -r requirements.txt
```
## Practice
In this part we are going to cover the following points:
1. create a classification model conversion pipeline
2. provide the inference, process prediction results
### Model Conversion Pipeline
The code in this subchapter is located in the ``samples/dnn/dnn_model_runner`` module and can be executed with the line:
```console
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_resnet50_onnx
```
The following code contains the description of the below-listed steps:
1. instantiate PyTorch model
2. convert PyTorch model into ``.onnx``
```python
# initialize PyTorch ResNet-50 model
original_model = models.resnet50(pretrained=True)
# get the path to the converted into ONNX PyTorch model
full_model_path = get_pytorch_onnx_model(original_model)
print("PyTorch ResNet-50 model was successfully converted: ", full_model_path)
```
``get_pytorch_onnx_model(original_model)`` function is based on ``torch.onnx.export(...)`` call:
```python
# define the directory for further converted model save
onnx_model_path = "models"
# define the name of further converted model
onnx_model_name = "resnet50.onnx"
# create directory for further converted model
os.makedirs(onnx_model_path, exist_ok=True)
# get full path to the converted model
full_model_path = os.path.join(onnx_model_path, onnx_model_name)
# generate model input
generated_input = Variable(
torch.randn(1, 3, 224, 224)
)
# model export into ONNX format
torch.onnx.export(
original_model,
generated_input,
full_model_path,
verbose=True,
input_names=["input"],
output_names=["output"],
opset_version=11
)
```
After the successful execution of the above code we will get the following output:
```console
PyTorch ResNet-50 model was successfully converted: models/resnet50.onnx
```
The proposed in ``dnn/samples`` module ``dnn_model_runner`` allows us to reproduce the above conversion steps for the following PyTorch classification models:
* alexnet
* vgg11
* vgg13
* vgg16
* vgg19
* resnet18
* resnet34
* resnet50
* resnet101
* resnet152
* squeezenet1_0
* squeezenet1_1
* resnext50_32x4d
* resnext101_32x8d
* wide_resnet50_2
* wide_resnet101_2
To obtain the converted model, the following line should be executed:
```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name> --evaluate False
```
For the ResNet-50 case the below line should be run:
```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50 --evaluate False
```
The default root directory for the converted model storage is defined in module ``CommonConfig``:
```python
@dataclass
class CommonConfig:
output_data_root_dir: str = "dnn_model_runner/dnn_conversion"
```
Thus, the converted ResNet-50 will be saved in ``dnn_model_runner/dnn_conversion/models``.
### Inference Pipeline
Now we can use ```models/resnet50.onnx``` for the inference pipeline using OpenCV C/C++ API. The implemented pipeline can be found in [samples/dnn/classification.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.cpp).
After the build of samples (``BUILD_EXAMPLES`` flag value should be ``ON``), the appropriate ``example_dnn_classification`` executable file will be provided.
To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:
```console
fox squirrel, eastern fox squirrel, Sciurus niger
```
![Classification model input image](images/squirrel_cls.jpg)
For the label decoding of the obtained prediction, we also need ``imagenet_classes.txt`` file, which contains the full list of the ImageNet classes.
In this tutorial we will run the inference process for the converted PyTorch ResNet-50 model from the build (``samples/build``) directory:
```
./dnn/example_dnn_classification --model=../dnn/models/resnet50.onnx --input=../data/squirrel_cls.jpg --width=224 --height=224 --rgb=true --scale="0.003921569" --mean="123.675 116.28 103.53" --std="0.229 0.224 0.225" --crop=true --initial_width=256 --initial_height=256 --classes=../data/dnn/classification_classes_ILSVRC2012.txt
```
Let's explore ``classification.cpp`` key points step by step:
1. read the model with cv::dnn::readNet, initialize the network:
```cpp
Net net = readNet(model, config, framework);
```
The ``model`` parameter value is taken from ``--model`` key. In our case, it is ``resnet50.onnx``.
* preprocess input image:
```cpp
if (rszWidth != 0 && rszHeight != 0)
{
resize(frame, frame, Size(rszWidth, rszHeight));
}
// Create a 4D blob from a frame
blobFromImage(frame, blob, scale, Size(inpWidth, inpHeight), mean, swapRB, crop);
// Check std values.
if (std.val[0] != 0.0 && std.val[1] != 0.0 && std.val[2] != 0.0)
{
// Divide blob by std.
divide(blob, std, blob);
}
```
In this step we use cv::dnn::blobFromImage function to prepare model input.
We set ``Size(rszWidth, rszHeight)`` with ``--initial_width=256 --initial_height=256`` for the initial image resize as it's described in [PyTorch ResNet inference pipeline](https://pytorch.org/hub/pytorch_vision_resnet/).
It should be noted that firstly in cv::dnn::blobFromImage mean value is subtracted and only then pixel values are multiplied by scale.
Thus, we use ``--mean="123.675 116.28 103.53"``, which is equivalent to ``[0.485, 0.456, 0.406]`` multiplied by ``255.0`` to reproduce the original image preprocessing order for PyTorch classification models:
```python
img /= 255.0
img -= [0.485, 0.456, 0.406]
img /= [0.229, 0.224, 0.225]
```
* make forward pass:
```cpp
net.setInput(blob);
Mat prob = net.forward();
```
* process the prediction:
```cpp
Point classIdPoint;
double confidence;
minMaxLoc(prob.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
int classId = classIdPoint.x;
```
Here we choose the most likely object class. The ``classId`` result for our case is 335 - fox squirrel, eastern fox squirrel, Sciurus niger:
![ResNet50 OpenCV C++ inference output](images/opencv_resnet50_test_res_c.jpg)

View File

@@ -0,0 +1,362 @@
# Conversion of PyTorch Classification Models and Launch with OpenCV Python {#pytorch_cls_tutorial_dnn_conversion}
@prev_tutorial{tutorial_dnn_OCR}
@next_tutorial{pytorch_cls_c_tutorial_dnn_conversion}
| | |
| -: | :- |
| Original author | Anastasia Murzova |
| Compatibility | OpenCV >= 4.5 |
## Goals
In this tutorial you will learn how to:
* convert PyTorch classification models into ONNX format
* run converted PyTorch model with OpenCV Python API
* obtain an evaluation of the PyTorch and OpenCV DNN models.
We will explore the above-listed points by the example of the ResNet-50 architecture.
## Introduction
Let's briefly view the key concepts involved in the pipeline of PyTorch models transition with OpenCV API. The initial step in conversion of PyTorch models into cv.dnn.Net
is model transferring into [ONNX](https://onnx.ai/about.html) format. ONNX aims at the interchangeability of the neural networks between various frameworks. There is a built-in function in PyTorch for ONNX conversion: [``torch.onnx.export``](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export).
Further the obtained ``.onnx`` model is passed into cv.dnn.readNetFromONNX.
## Requirements
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
```console
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
The below line initiates requirements installation into the previously activated virtual environment:
```console
pip install -r requirements.txt
```
## Practice
In this part we are going to cover the following points:
1. create a classification model conversion pipeline and provide the inference
2. evaluate and test classification models
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" part can be skipped.
### Model Conversion Pipeline
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
```console
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_resnet50
```
The following code contains the description of the below-listed steps:
1. instantiate PyTorch model
2. convert PyTorch model into ``.onnx``
3. read the transferred network with OpenCV API
4. prepare input data
5. provide inference
```python
# initialize PyTorch ResNet-50 model
original_model = models.resnet50(pretrained=True)
# get the path to the converted into ONNX PyTorch model
full_model_path = get_pytorch_onnx_model(original_model)
# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
print("OpenCV model was successfully read. Layer IDs: \n", opencv_net.getLayerNames())
# get preprocessed image
input_img = get_preprocessed_img("../data/squirrel_cls.jpg")
# get ImageNet labels
imagenet_labels = get_imagenet_labels("../data/dnn/classification_classes_ILSVRC2012.txt")
# obtain OpenCV DNN predictions
get_opencv_dnn_prediction(opencv_net, input_img, imagenet_labels)
# obtain original PyTorch ResNet50 predictions
get_pytorch_dnn_prediction(original_model, input_img, imagenet_labels)
```
To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:
```console
fox squirrel, eastern fox squirrel, Sciurus niger
```
![Classification model input image](images/squirrel_cls.jpg)
For the label decoding of the obtained prediction, we also need ``imagenet_classes.txt`` file, which contains the full list of the ImageNet classes.
Let's go deeper into each step by the example of pretrained PyTorch ResNet-50:
* instantiate PyTorch ResNet-50 model:
```python
# initialize PyTorch ResNet-50 model
original_model = models.resnet50(pretrained=True)
```
* convert PyTorch model into ONNX:
```python
# define the directory for further converted model save
onnx_model_path = "models"
# define the name of further converted model
onnx_model_name = "resnet50.onnx"
# create directory for further converted model
os.makedirs(onnx_model_path, exist_ok=True)
# get full path to the converted model
full_model_path = os.path.join(onnx_model_path, onnx_model_name)
# generate model input
generated_input = Variable(
torch.randn(1, 3, 224, 224)
)
# model export into ONNX format
torch.onnx.export(
original_model,
generated_input,
full_model_path,
verbose=True,
input_names=["input"],
output_names=["output"],
opset_version=11
)
```
After the successful execution of the above code, we will get ``models/resnet50.onnx``.
* read the transferred network with cv.dnn.readNetFromONNX passing the obtained in the previous step ONNX model into it:
```python
# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
```
* prepare input data:
```python
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)
input_img = cv2.resize(input_img, (256, 256))
# define preprocess parameters
mean = np.array([0.485, 0.456, 0.406]) * 255.0
scale = 1 / 255.0
std = [0.229, 0.224, 0.225]
# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
image=input_img,
scalefactor=scale,
size=(224, 224), # img target size
mean=mean,
swapRB=True, # BGR -> RGB
crop=True # center crop
)
# 3. divide by std
input_blob[0] /= np.asarray(std, dtype=np.float32).reshape(3, 1, 1)
```
In this step we read the image and prepare model input with cv.dnn.blobFromImage function, which returns 4-dimensional blob.
It should be noted that firstly in cv.dnn.blobFromImage mean value is subtracted and only then pixel values are multiplied by scale. Thus, ``mean`` is multiplied by ``255.0`` to reproduce the original image preprocessing order:
```python
img /= 255.0
img -= [0.485, 0.456, 0.406]
img /= [0.229, 0.224, 0.225]
```
* OpenCV cv.dnn.Net inference:
```python
# set OpenCV DNN input
opencv_net.setInput(preproc_img)
# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN prediction: \n")
print("* shape: ", out.shape)
# get the predicted class ID
imagenet_class_id = np.argmax(out)
# get confidence
confidence = out[0][imagenet_class_id]
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
print("* confidence: {:.4f}".format(confidence))
```
After the above code execution we will get the following output:
```console
OpenCV DNN prediction:
* shape: (1, 1000)
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 14.8308
```
* PyTorch ResNet-50 model inference:
```python
original_net.eval()
preproc_img = torch.FloatTensor(preproc_img)
# inference
out = original_net(preproc_img)
print("\nPyTorch model prediction: \n")
print("* shape: ", out.shape)
# get the predicted class ID
imagenet_class_id = torch.argmax(out, axis=1).item()
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
# get confidence
confidence = out[0][imagenet_class_id]
print("* confidence: {:.4f}".format(confidence.item()))
```
After the above code launching we will get the following output:
```console
PyTorch model prediction:
* shape: torch.Size([1, 1000])
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 14.8308
```
The inference results of the original ResNet-50 model and cv.dnn.Net are equal. For the extended evaluation of the models we can use ``py_to_py_cls`` of the ``dnn_model_runner`` module. This module part will be described in the next subchapter.
### Evaluation of the Models
The proposed in ``samples/dnn`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the ImageNet dataset and test execution for the following PyTorch classification models:
* alexnet
* vgg11
* vgg13
* vgg16
* vgg19
* resnet18
* resnet34
* resnet50
* resnet101
* resnet152
* squeezenet1_0
* squeezenet1_1
* resnext50_32x4d
* resnext101_32x8d
* wide_resnet50_2
* wide_resnet101_2
This list can be also extended with further appropriate evaluation pipeline configuration.
#### Evaluation Mode
The below line represents running of the module in the evaluation mode:
```console
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name>
```
Chosen from the list classification model will be read into OpenCV cv.dnn.Net object. Evaluation results of PyTorch and OpenCV models (accuracy, inference time, L1) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
Necessary evaluation configurations are defined in the [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) and can be modified in accordance with actual paths of data location:
```python
@dataclass
class TestClsConfig:
batch_size: int = 50
frame_size: int = 224
img_root_dir: str = "./ILSVRC2012_img_val"
# location of image-class matching
img_cls_file: str = "./val.txt"
bgr_to_rgb: bool = True
```
To initiate the evaluation of the PyTorch ResNet-50, run the following line:
```console
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50
```
After script launch, the log file with evaluation data will be generated in ``dnn_model_runner/dnn_conversion/logs``:
```console
The model PyTorch resnet50 was successfully obtained and converted to OpenCV DNN resnet50
===== Running evaluation of the model with the following params:
* val data location: ./ILSVRC2012_img_val
* log file location: dnn_model_runner/dnn_conversion/logs/PyTorch_resnet50_log.txt
```
#### Test Mode
The below line represents running of the module in the test mode, namely it provides the steps for the model inference:
```console
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name> --test True --default_img_preprocess <True/False> --evaluate False
```
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
Test configuration is represented in [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestClsModuleConfig`` class:
```python
@dataclass
class TestClsModuleConfig:
cls_test_data_dir: str = "../data"
test_module_name: str = "classification"
test_module_path: str = "classification.py"
input_img: str = os.path.join(cls_test_data_dir, "squirrel_cls.jpg")
model: str = ""
frame_height: str = str(TestClsConfig.frame_size)
frame_width: str = str(TestClsConfig.frame_size)
scale: str = "1.0"
mean: List[str] = field(default_factory=lambda: ["0.0", "0.0", "0.0"])
std: List[str] = field(default_factory=list)
crop: str = "False"
rgb: str = "True"
rsz_height: str = ""
rsz_width: str = ""
classes: str = os.path.join(cls_test_data_dir, "dnn", "classification_classes_ILSVRC2012.txt")
```
The default image preprocessing options are defined in [default_preprocess_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/default_preprocess_config.py). For instance:
```python
BASE_IMG_SCALE_FACTOR = 1 / 255.0
PYTORCH_RSZ_HEIGHT = 256
PYTORCH_RSZ_WIDTH = 256
pytorch_resize_input_blob = {
"mean": ["123.675", "116.28", "103.53"],
"scale": str(BASE_IMG_SCALE_FACTOR),
"std": ["0.229", "0.224", "0.225"],
"crop": "True",
"rgb": "True",
"rsz_height": str(PYTORCH_RSZ_HEIGHT),
"rsz_width": str(PYTORCH_RSZ_WIDTH)
}
```
The basis of the model testing is represented in [samples/dnn/classification.py](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.py). ``classification.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for cv.dnn.blobFromImage.
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
```console
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50 --test True --default_img_preprocess True --evaluate False
```
The network prediction is depicted in the top left corner of the output window:
![ResNet50 OpenCV inference output](images/pytorch_resnet50_opencv_test_res.jpg)

View File

@@ -0,0 +1,360 @@
# Conversion of TensorFlow Classification Models and Launch with OpenCV Python {#tf_cls_tutorial_dnn_conversion}
| | |
| -: | :- |
| Original author | Anastasia Murzova |
| Compatibility | OpenCV >= 4.5 |
## Goals
In this tutorial you will learn how to:
* obtain frozen graphs of TensorFlow (TF) classification models
* run converted TensorFlow model with OpenCV Python API
* obtain an evaluation of the TensorFlow and OpenCV DNN models
We will explore the above-listed points by the example of MobileNet architecture.
## Introduction
Let's briefly view the key concepts involved in the pipeline of TensorFlow models transition with OpenCV API. The initial step in conversion of TensorFlow models into cv.dnn.Net
is obtaining the frozen TF model graph. Frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. Usually the frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (```.pb```) files.
After the model ``.pb`` file was generated it can be read with cv.dnn.readNetFromTensorflow function.
## Requirements
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
```console
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
The below line initiates requirements installation into the previously activated virtual environment:
```console
pip install -r requirements.txt
```
## Practice
In this part we are going to cover the following points:
1. create a TF classification model conversion pipeline and provide the inference
2. evaluate and test TF classification models
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" tutorial part can be skipped.
### Model Conversion Pipeline
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
```console
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_mobilenet
```
The following code contains the description of the below-listed steps:
1. instantiate TF model
2. create TF frozen graph
3. read TF frozen graph with OpenCV API
4. prepare input data
5. provide inference
```python
# initialize TF MobileNet model
original_tf_model = MobileNet(
include_top=True,
weights="imagenet"
)
# get TF frozen graph path
full_pb_path = get_tf_model_proto(original_tf_model)
# read frozen graph with OpenCV API
opencv_net = cv2.dnn.readNetFromTensorflow(full_pb_path)
print("OpenCV model was successfully read. Model layers: \n", opencv_net.getLayerNames())
# get preprocessed image
input_img = get_preprocessed_img("../data/squirrel_cls.jpg")
# get ImageNet labels
imagenet_labels = get_imagenet_labels("../data/dnn/classification_classes_ILSVRC2012.txt")
# obtain OpenCV DNN predictions
get_opencv_dnn_prediction(opencv_net, input_img, imagenet_labels)
# obtain TF model predictions
get_tf_dnn_prediction(original_tf_model, input_img, imagenet_labels)
```
To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:
```console
fox squirrel, eastern fox squirrel, Sciurus niger
```
![Classification model input image](images/squirrel_cls.jpg)
For the label decoding of the obtained prediction, we also need ``imagenet_classes.txt`` file, which contains the full list of the ImageNet classes.
Let's go deeper into each step by the example of pretrained TF MobileNet:
* instantiate TF model:
```python
# initialize TF MobileNet model
original_tf_model = MobileNet(
include_top=True,
weights="imagenet"
)
```
* create TF frozen graph
```python
# define the directory for .pb model
pb_model_path = "models"
# define the name of .pb model
pb_model_name = "mobilenet.pb"
# create directory for further converted model
os.makedirs(pb_model_path, exist_ok=True)
# get model TF graph
tf_model_graph = tf.function(lambda x: tf_model(x))
# get concrete function
tf_model_graph = tf_model_graph.get_concrete_function(
tf.TensorSpec(tf_model.inputs[0].shape, tf_model.inputs[0].dtype))
# obtain frozen concrete function
frozen_tf_func = convert_variables_to_constants_v2(tf_model_graph)
# get frozen graph
frozen_tf_func.graph.as_graph_def()
# save full tf model
tf.io.write_graph(graph_or_graph_def=frozen_tf_func.graph,
logdir=pb_model_path,
name=pb_model_name,
as_text=False)
```
After the successful execution of the above code, we will get a frozen graph in ``models/mobilenet.pb``.
* read TF frozen graph with with cv.dnn.readNetFromTensorflow passing the obtained in the previous step ``mobilenet.pb`` into it:
```python
# get TF frozen graph path
full_pb_path = get_tf_model_proto(original_tf_model)
```
* prepare input data with cv2.dnn.blobFromImage function:
```python
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)
# define preprocess parameters
mean = np.array([1.0, 1.0, 1.0]) * 127.5
scale = 1 / 127.5
# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
image=input_img,
scalefactor=scale,
size=(224, 224), # img target size
mean=mean,
swapRB=True, # BGR -> RGB
crop=True # center crop
)
print("Input blob shape: {}\n".format(input_blob.shape))
```
Please, pay attention at the preprocessing order in the cv2.dnn.blobFromImage function. Firstly, the mean value is subtracted and only then pixel values are multiplied by the defined scale.
Therefore, to reproduce the image preprocessing pipeline from the TF [``mobilenet.preprocess_input``](https://github.com/tensorflow/tensorflow/blob/02032fb477e9417197132648ec81e75beee9063a/tensorflow/python/keras/applications/mobilenet.py#L443-L445) function, we multiply ``mean`` by ``127.5``.
As a result, 4-dimensional ``input_blob`` was obtained:
``Input blob shape: (1, 3, 224, 224)``
* provide OpenCV cv.dnn.Net inference:
```python
# set OpenCV DNN input
opencv_net.setInput(preproc_img)
# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN prediction: \n")
print("* shape: ", out.shape)
# get the predicted class ID
imagenet_class_id = np.argmax(out)
# get confidence
confidence = out[0][imagenet_class_id]
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
print("* confidence: {:.4f}\n".format(confidence))
```
After the above code execution we will get the following output:
```console
OpenCV DNN prediction:
* shape: (1, 1000)
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 0.9525
```
* provide TF MobileNet inference:
```python
# inference
preproc_img = preproc_img.transpose(0, 2, 3, 1)
print("TF input blob shape: {}\n".format(preproc_img.shape))
out = original_net(preproc_img)
print("\nTensorFlow model prediction: \n")
print("* shape: ", out.shape)
# get the predicted class ID
imagenet_class_id = np.argmax(out)
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
# get confidence
confidence = out[0][imagenet_class_id]
print("* confidence: {:.4f}".format(confidence))
```
To fit TF model input, ``input_blob`` was transposed:
```console
TF input blob shape: (1, 224, 224, 3)
```
TF inference results are the following:
```console
TensorFlow model prediction:
* shape: (1, 1000)
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 0.9525
```
As it can be seen from the experiments OpenCV and TF inference results are equal.
### Evaluation of the Models
The proposed in ``dnn/samples`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the ImageNet dataset and test execution for the following TensorFlow classification models:
* vgg16
* vgg19
* resnet50
* resnet101
* resnet152
* densenet121
* densenet169
* densenet201
* inceptionresnetv2
* inceptionv3
* mobilenet
* mobilenetv2
* nasnetlarge
* nasnetmobile
* xception
This list can be also extended with further appropriate evaluation pipeline configuration.
#### Evaluation Mode
To below line represents running of the module in the evaluation mode:
```console
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name <tf_cls_model_name>
```
Chosen from the list classification model will be read into OpenCV ``cv.dnn_Net`` object. Evaluation results of TF and OpenCV models (accuracy, inference time, L1) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
Necessary evaluation configurations are defined in the [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) and can be modified in accordance with actual paths of data location::
```python
@dataclass
class TestClsConfig:
batch_size: int = 50
frame_size: int = 224
img_root_dir: str = "./ILSVRC2012_img_val"
# location of image-class matching
img_cls_file: str = "./val.txt"
bgr_to_rgb: bool = True
```
The values from ``TestClsConfig`` can be customized in accordance with chosen model.
To initiate the evaluation of the TensorFlow MobileNet, run the following line:
```console
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name mobilenet
```
After script launch, the log file with evaluation data will be generated in ``dnn_model_runner/dnn_conversion/logs``:
```console
===== Running evaluation of the model with the following params:
* val data location: ./ILSVRC2012_img_val
* log file location: dnn_model_runner/dnn_conversion/logs/TF_mobilenet_log.txt
```
#### Test Mode
The below line represents running of the module in the test mode, namely it provides the steps for the model inference:
```console
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name <tf_cls_model_name> --test True --default_img_preprocess <True/False> --evaluate False
```
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
Test configuration is represented in [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestClsModuleConfig`` class:
```python
@dataclass
class TestClsModuleConfig:
cls_test_data_dir: str = "../data"
test_module_name: str = "classification"
test_module_path: str = "classification.py"
input_img: str = os.path.join(cls_test_data_dir, "squirrel_cls.jpg")
model: str = ""
frame_height: str = str(TestClsConfig.frame_size)
frame_width: str = str(TestClsConfig.frame_size)
scale: str = "1.0"
mean: List[str] = field(default_factory=lambda: ["0.0", "0.0", "0.0"])
std: List[str] = field(default_factory=list)
crop: str = "False"
rgb: str = "True"
rsz_height: str = ""
rsz_width: str = ""
classes: str = os.path.join(cls_test_data_dir, "dnn", "classification_classes_ILSVRC2012.txt")
```
The default image preprocessing options are defined in ``default_preprocess_config.py``. For instance, for MobileNet:
```python
tf_input_blob = {
"mean": ["127.5", "127.5", "127.5"],
"scale": str(1 / 127.5),
"std": [],
"crop": "True",
"rgb": "True"
}
```
The basis of the model testing is represented in [samples/dnn/classification.py](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.py). ``classification.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for cv.dnn.blobFromImage.
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
```console
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name mobilenet --test True --default_img_preprocess True --evaluate False
```
The network prediction is depicted in the top left corner of the output window:
![TF MobileNet OpenCV inference output](images/tf_mobilenet_opencv_test_res.jpg)

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

View File

@@ -0,0 +1,140 @@
# Conversion of TensorFlow Detection Models and Launch with OpenCV Python {#tf_det_tutorial_dnn_conversion}
| | |
| -: | :- |
| Original author | Anastasia Murzova |
| Compatibility | OpenCV >= 4.5 |
## Goals
In this tutorial you will learn how to:
* obtain frozen graphs of TensorFlow (TF) detection models
* run converted TensorFlow model with OpenCV Python API
We will explore the above-listed points by the example of SSD MobileNetV1.
## Introduction
Let's briefly view the key concepts involved in the pipeline of TensorFlow models transition with OpenCV API. The initial step in the conversion of TensorFlow models into cv.dnn.Net
is obtaining the frozen TF model graph. A frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. The frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (```.pb```) files.
There are special functions for reading ``.pb`` graphs in OpenCV: cv.dnn.readNetFromTensorflow and cv.dnn.readNet.
## Requirements
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
```console
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
The below line initiates requirements installation into the previously activated virtual environment:
```console
pip install -r requirements.txt
```
## Practice
In this part we are going to cover the following points:
1. create a TF classification model conversion pipeline and provide the inference
2. provide the inference, process prediction results
### Model Preparation
The code in this subchapter is located in the ``samples/dnn/dnn_model_runner`` module and can be executed with the below line:
```console
python -m dnn_model_runner.dnn_conversion.tf.detection.py_to_py_ssd_mobilenet
```
The following code contains the steps of the TF SSD MobileNetV1 model retrieval:
```python
tf_model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
graph_extraction_dir = "./"
frozen_graph_path = extract_tf_frozen_graph(tf_model_name, graph_extraction_dir)
print("Frozen graph path for {}: {}".format(tf_model_name, frozen_graph_path))
```
In ``extract_tf_frozen_graph`` function we extract the provided in model archive ``frozen_inference_graph.pb`` for its further processing:
```python
# define model archive name
tf_model_tar = model_name + '.tar.gz'
# define link to retrieve model archive
model_link = DETECTION_MODELS_URL + tf_model_tar
tf_frozen_graph_name = 'frozen_inference_graph'
try:
urllib.request.urlretrieve(model_link, tf_model_tar)
except Exception:
print("TF {} was not retrieved: {}".format(model_name, model_link))
return
print("TF {} was retrieved.".format(model_name))
tf_model_tar = tarfile.open(tf_model_tar)
frozen_graph_path = ""
for model_tar_elem in tf_model_tar.getmembers():
if tf_frozen_graph_name in os.path.basename(model_tar_elem.name):
tf_model_tar.extract(model_tar_elem, extracted_model_path)
frozen_graph_path = os.path.join(extracted_model_path, model_tar_elem.name)
break
tf_model_tar.close()
```
After the successful execution of the above code we will get the following output:
```console
TF ssd_mobilenet_v1_coco_2017_11_17 was retrieved.
Frozen graph path for ssd_mobilenet_v1_coco_2017_11_17: ./ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb
```
To provide model inference we will use the below [double-decker bus photo](https://www.pexels.com/photo/bus-and-car-on-one-way-street-3626589/) (under [Pexels](https://www.pexels.com/license/) license):
![Double-decker bus](images/pexels_double_decker_bus.jpg)
To initiate the test process we need to provide an appropriate model configuration. We will use [``ssd_mobilenet_v1_coco.config``](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config) from [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection#tensorflow-object-detection-api).
TensorFlow Object Detection API framework contains helpful mechanisms for object detection model manipulations.
We will use this configuration to provide a text graph representation. To generate ``.pbtxt`` we will use the corresponding [``samples/dnn/tf_text_graph_ssd.py``](https://github.com/opencv/opencv/blob/master/samples/dnn/tf_text_graph_ssd.py) script:
```console
python tf_text_graph_ssd.py --input ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb --config ssd_mobilenet_v1_coco_2017_11_17/ssd_mobilenet_v1_coco.config --output ssd_mobilenet_v1_coco_2017_11_17.pbtxt
```
After successful execution ``ssd_mobilenet_v1_coco_2017_11_17.pbtxt`` will be created.
Before we run ``object_detection.py``, let's have a look at the default values for the SSD MobileNetV1 test process configuration. They are located in [``models.yml``](https://github.com/opencv/opencv/blob/master/samples/dnn/models.yml):
```yml
ssd_tf:
model: "ssd_mobilenet_v1_coco_2017_11_17.pb"
config: "ssd_mobilenet_v1_coco_2017_11_17.pbtxt"
mean: [0, 0, 0]
scale: 1.0
width: 300
height: 300
rgb: true
classes: "object_detection_classes_coco.txt"
sample: "object_detection"
```
To fetch these values we need to provide frozen graph ``ssd_mobilenet_v1_coco_2017_11_17.pb`` model and text graph ``ssd_mobilenet_v1_coco_2017_11_17.pbtxt``:
```console
python object_detection.py ssd_tf --input ../data/pexels_double_decker_bus.jpg
```
This line is equivalent to:
```console
python object_detection.py --model ssd_mobilenet_v1_coco_2017_11_17.pb --config ssd_mobilenet_v1_coco_2017_11_17.pbtxt --input ../data/pexels_double_decker_bus.jpg --width 300 --height 300 --classes ../data/dnn/object_detection_classes_coco.txt
```
The result is:
![OpenCV SSD bus result](images/opencv_bus_res.jpg)
There are several helpful parameters, which can be also customized for result corrections: threshold (``--thr``) and non-maximum suppression (``--nms``) values.

View File

@@ -0,0 +1,332 @@
# Conversion of PyTorch Segmentation Models and Launch with OpenCV {#pytorch_segm_tutorial_dnn_conversion}
## Goals
In this tutorial you will learn how to:
* convert PyTorch segmentation models
* run converted PyTorch model with OpenCV
* obtain an evaluation of the PyTorch and OpenCV DNN models
We will explore the above-listed points by the example of the FCN ResNet-50 architecture.
## Introduction
The key points involved in the transition pipeline of the [PyTorch classification](https://link_to_cls_tutorial) and segmentation models with OpenCV API are equal. The first step is model transferring into [ONNX](https://onnx.ai/about.html) format with PyTorch [``torch.onnx.export``](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export) built-in function.
Further the obtained ``.onnx`` model is passed into cv.dnn.readNetFromONNX, which returns cv.dnn.Net object ready for DNN manipulations.
## Practice
In this part we are going to cover the following points:
1. create a segmentation model conversion pipeline and provide the inference
2. evaluate and test segmentation models
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" part can be skipped.
### Model Conversion Pipeline
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
``
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_fcnresnet50
``
The following code contains the description of the below-listed steps:
1. instantiate PyTorch model
2. convert PyTorch model into ``.onnx``
3. read the transferred network with OpenCV API
4. prepare input data
5. provide inference
6. get colored masks from predictions
7. visualize results
```python
# initialize PyTorch FCN ResNet-50 model
original_model = models.segmentation.fcn_resnet50(pretrained=True)
# get the path to the converted into ONNX PyTorch model
full_model_path = get_pytorch_onnx_model(original_model)
# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
print("OpenCV model was successfully read. Layer IDs: \n", opencv_net.getLayerNames())
# get preprocessed image
img, input_img = get_processed_imgs("test_data/sem_segm/2007_000033.jpg")
# obtain OpenCV DNN predictions
opencv_prediction = get_opencv_dnn_prediction(opencv_net, input_img)
# obtain original PyTorch ResNet50 predictions
pytorch_prediction = get_pytorch_dnn_prediction(original_model, input_img)
pascal_voc_classes, pascal_voc_colors = read_colors_info("test_data/sem_segm/pascal-classes.txt")
# obtain colored segmentation masks
opencv_colored_mask = get_colored_mask(img.shape, opencv_prediction, pascal_voc_colors)
pytorch_colored_mask = get_colored_mask(img.shape, pytorch_prediction, pascal_voc_colors)
# obtain palette of PASCAL VOC colors
color_legend = get_legend(pascal_voc_classes, pascal_voc_colors)
cv2.imshow('PyTorch Colored Mask', pytorch_colored_mask)
cv2.imshow('OpenCV DNN Colored Mask', opencv_colored_mask)
cv2.imshow('Color Legend', color_legend)
cv2.waitKey(0)
```
To provide the model inference we will use the below picture from the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) validation dataset:
![PASCAL VOC img](images/2007_000033.jpg)
The target segmented result is:
![PASCAL VOC ground truth](images/2007_000033.png)
For the PASCAL VOC colors decoding and its mapping with the predicted masks, we also need ``pascal-classes.txt`` file, which contains the full list of the PASCAL VOC classes and corresponding colors.
Let's go deeper into each code step by the example of pretrained PyTorch FCN ResNet-50:
* instantiate PyTorch FCN ResNet-50 model:
```python
# initialize PyTorch FCN ResNet-50 model
original_model = models.segmentation.fcn_resnet50(pretrained=True)
```
* convert PyTorch model into ONNX format:
```python
# define the directory for further converted model save
onnx_model_path = "models"
# define the name of further converted model
onnx_model_name = "fcnresnet50.onnx"
# create directory for further converted model
os.makedirs(onnx_model_path, exist_ok=True)
# get full path to the converted model
full_model_path = os.path.join(onnx_model_path, onnx_model_name)
# generate model input to build the graph
generated_input = Variable(
torch.randn(1, 3, 500, 500)
)
# model export into ONNX format
torch.onnx.export(
original_model,
generated_input,
full_model_path,
verbose=True,
input_names=["input"],
output_names=["output"],
opset_version=11
)
```
The code from this step does not differ from the classification conversion case. Thus, after the successful execution of the above code, we will get ``models/fcnresnet50.onnx``.
* read the transferred network with cv.dnn.readNetFromONNX passing the obtained in the previous step ONNX model into it:
```python
# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
```
* prepare input data:
```python
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)
# target image sizes
img_height = input_img.shape[0]
img_width = input_img.shape[1]
# define preprocess parameters
mean = np.array([0.485, 0.456, 0.406]) * 255.0
scale = 1 / 255.0
std = [0.229, 0.224, 0.225]
# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
image=input_img,
scalefactor=scale,
size=(img_width, img_height), # img target size
mean=mean,
swapRB=True, # BGR -> RGB
crop=False # center crop
)
# 3. divide by std
input_blob[0] /= np.asarray(std, dtype=np.float32).reshape(3, 1, 1)
```
In this step we read the image and prepare model input with cv2.dnn.blobFromImage function, which returns 4-dimensional blob.
It should be noted that firstly in ``cv2.dnn.blobFromImage`` mean value is subtracted and only then pixel values are scaled. Thus, ``mean`` is multiplied by ``255.0`` to reproduce the original image preprocessing order:
```python
img /= 255.0
img -= [0.485, 0.456, 0.406]
img /= [0.229, 0.224, 0.225]
```
* OpenCV ``cv.dnn_Net`` inference:
```python
# set OpenCV DNN input
opencv_net.setInput(preproc_img)
# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN segmentation prediction: \n")
print("* shape: ", out.shape)
# get IDs of predicted classes
out_predictions = np.argmax(out[0], axis=0)
```
After the above code execution we will get the following output:
```
OpenCV DNN segmentation prediction:
* shape: (1, 21, 500, 500)
```
Each prediction channel out of 21, where 21 represents the number of PASCAL VOC classes, contains probabilities, which indicate how likely the pixel corresponds to the PASCAL VOC class.
* PyTorch FCN ResNet-50 model inference:
```python
original_net.eval()
preproc_img = torch.FloatTensor(preproc_img)
with torch.no_grad():
# obtaining unnormalized probabilities for each class
out = original_net(preproc_img)['out']
print("\nPyTorch segmentation model prediction: \n")
print("* shape: ", out.shape)
# get IDs of predicted classes
out_predictions = out[0].argmax(dim=0)
```
After the above code launching we will get the following output:
```
PyTorch segmentation model prediction:
* shape: torch.Size([1, 21, 366, 500])
```
PyTorch prediction also contains probabilities corresponding to each class prediction.
* get colored masks from predictions:
```python
# convert mask values into PASCAL VOC colors
processed_mask = np.stack([colors[color_id] for color_id in segm_mask.flatten()])
# reshape mask into 3-channel image
processed_mask = processed_mask.reshape(mask_height, mask_width, 3)
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
np.uint8)
# convert colored mask from BGR to RGB for compatibility with PASCAL VOC colors
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
```
In this step we map the probabilities from segmentation masks with appropriate colors of the predicted classes. Let's have a look at the results:
![OpenCV Colored Mask](images/legend_opencv_color_mask.png)
For the extended evaluation of the models, we can use ``py_to_py_segm`` script of the ``dnn_model_runner`` module. This module part will be described in the next subchapter.
### Evaluation of the Models
The proposed in ``dnn/samples`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the PASCAL VOC dataset and test execution for the following PyTorch segmentation models:
* FCN ResNet-50
* FCN ResNet-101
This list can be also extended with further appropriate evaluation pipeline configuration.
#### Evaluation Mode
The below line represents running of the module in the evaluation mode:
```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name <pytorch_segm_model_name>
```
Chosen from the list segmentation model will be read into OpenCV ``cv.dnn_Net`` object. Evaluation results of PyTorch and OpenCV models (pixel accuracy, mean IoU, inference time) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
Necessary evaluation configurations are defined in the [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py):
```python
@dataclass
class TestSegmConfig:
frame_size: int = 500
img_root_dir: str = "./VOC2012"
img_dir: str = os.path.join(img_root_dir, "JPEGImages/")
img_segm_gt_dir: str = os.path.join(img_root_dir, "SegmentationClass/")
# reduced val: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt
segm_val_file: str = os.path.join(img_root_dir, "ImageSets/Segmentation/seg11valid.txt")
colour_file_cls: str = os.path.join(img_root_dir, "ImageSets/Segmentation/pascal-classes.txt")
```
These values can be modified in accordance with chosen model pipeline.
To initiate the evaluation of the PyTorch FCN ResNet-50, run the following line:
```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name fcnresnet50
```
#### Test Mode
The below line represents running of the module in the test mode, which provides the steps for the model inference:
```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name <pytorch_segm_model_name> --test True --default_img_preprocess <True/False> --evaluate False
```
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
Test configuration is represented in [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestSegmModuleConfig`` class:
```python
@dataclass
class TestSegmModuleConfig:
segm_test_data_dir: str = "test_data/sem_segm"
test_module_name: str = "segmentation"
test_module_path: str = "segmentation.py"
input_img: str = os.path.join(segm_test_data_dir, "2007_000033.jpg")
model: str = ""
frame_height: str = str(TestSegmConfig.frame_size)
frame_width: str = str(TestSegmConfig.frame_size)
scale: float = 1.0
mean: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
std: List[float] = field(default_factory=list)
crop: bool = False
rgb: bool = True
classes: str = os.path.join(segm_test_data_dir, "pascal-classes.txt")
```
The default image preprocessing options are defined in ``default_preprocess_config.py``:
```python
pytorch_segm_input_blob = {
"mean": ["123.675", "116.28", "103.53"],
"scale": str(1 / 255.0),
"std": ["0.229", "0.224", "0.225"],
"crop": "False",
"rgb": "True"
}
```
The basis of the model testing is represented in ``samples/dnn/segmentation.py``. ``segmentation.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for ``cv2.dnn.blobFromImage``.
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name fcnresnet50 --test True --default_img_preprocess True --evaluate False
```

View File

@@ -0,0 +1,406 @@
# Conversion of TensorFlow Segmentation Models and Launch with OpenCV {#tf_segm_tutorial_dnn_conversion}
## Goals
In this tutorial you will learn how to:
* convert TensorFlow (TF) segmentation models
* run converted TensorFlow model with OpenCV
* obtain an evaluation of the TensorFlow and OpenCV DNN models
We will explore the above-listed points by the example of the DeepLab architecture.
## Introduction
The key concepts involved in the transition pipeline of the [TensorFlow classification](https://link_to_cls_tutorial) and segmentation models with OpenCV API are almost equal excepting the phase of graph optimization. The initial step in conversion of TensorFlow models into cv.dnn.Net
is obtaining the frozen TF model graph. Frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. Usually the frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (```.pb```) files.
To read the generated segmentation model ``.pb`` file with cv.dnn.readNetFromTensorflow, it is needed to modify the graph with TF [graph transform tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms).
## Practice
In this part we are going to cover the following points:
1. create a TF classification model conversion pipeline and provide the inference
2. evaluate and test TF classification models
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" tutorial part can be skipped.
### Model Conversion Pipeline
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_deeplab
```
TensorFlow segmentation models can be found in [TensorFlow Research Models](https://github.com/tensorflow/models/tree/master/research/#tensorflow-research-models) section, which contains the implementations of models on the basis of published research papers.
We will retrieve the archive with the pre-trained TF DeepLabV3 from the below link:
```
http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz
```
The full frozen graph obtaining pipeline is described in ``deeplab_retrievement.py``:
```python
def get_deeplab_frozen_graph():
# define model path to download
models_url = 'http://download.tensorflow.org/models/'
mobilenetv2_voctrainval = 'deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz'
# construct model link to download
model_link = models_url + mobilenetv2_voctrainval
try:
urllib.request.urlretrieve(model_link, mobilenetv2_voctrainval)
except Exception:
print("TF DeepLabV3 was not retrieved: {}".format(model_link))
return
tf_model_tar = tarfile.open(mobilenetv2_voctrainval)
# iterate the obtained model archive
for model_tar_elem in tf_model_tar.getmembers():
# check whether the model archive contains frozen graph
if TF_FROZEN_GRAPH_NAME in os.path.basename(model_tar_elem.name):
# extract frozen graph
tf_model_tar.extract(model_tar_elem, FROZEN_GRAPH_PATH)
tf_model_tar.close()
```
After running this script:
```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.deeplab_retrievement
```
we will get ``frozen_inference_graph.pb`` in ``deeplab/deeplabv3_mnv2_pascal_trainval``.
Before going to the network loading with OpenCV it is needed to optimize the extracted ``frozen_inference_graph.pb``.
To optimize the graph we use TF ``TransformGraph`` with default parameters:
```python
DEFAULT_OPT_GRAPH_NAME = "optimized_frozen_inference_graph.pb"
DEFAULT_INPUTS = "sub_7"
DEFAULT_OUTPUTS = "ResizeBilinear_3"
DEFAULT_TRANSFORMS = "remove_nodes(op=Identity)" \
" merge_duplicate_nodes" \
" strip_unused_nodes" \
" fold_constants(ignore_errors=true)" \
" fold_batch_norms" \
" fold_old_batch_norms"
def optimize_tf_graph(
in_graph,
out_graph=DEFAULT_OPT_GRAPH_NAME,
inputs=DEFAULT_INPUTS,
outputs=DEFAULT_OUTPUTS,
transforms=DEFAULT_TRANSFORMS,
is_manual=True,
was_optimized=True
):
# ...
tf_opt_graph = TransformGraph(
tf_graph,
inputs,
outputs,
transforms
)
```
To run graph optimization process, execute the line:
```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.tf_graph_optimizer --in_graph deeplab/deeplabv3_mnv2_pascal_trainval/frozen_inference_graph.pb
```
As a result ``deeplab/deeplabv3_mnv2_pascal_trainval`` directory will contain ``optimized_frozen_inference_graph.pb``.
After we have obtained the model graphs, let's examine the below-listed steps:
1. read TF ``frozen_inference_graph.pb`` graph
2. read optimized TF frozen graph with OpenCV API
3. prepare input data
4. provide inference
5. get colored masks from predictions
6. visualize results
```python
# get TF model graph from the obtained frozen graph
deeplab_graph = read_deeplab_frozen_graph(deeplab_frozen_graph_path)
# read DeepLab frozen graph with OpenCV API
opencv_net = cv2.dnn.readNetFromTensorflow(opt_deeplab_frozen_graph_path)
print("OpenCV model was successfully read. Model layers: \n", opencv_net.getLayerNames())
# get processed image
original_img_shape, tf_input_blob, opencv_input_img = get_processed_imgs("test_data/sem_segm/2007_000033.jpg")
# obtain OpenCV DNN predictions
opencv_prediction = get_opencv_dnn_prediction(opencv_net, opencv_input_img)
# obtain TF model predictions
tf_prediction = get_tf_dnn_prediction(deeplab_graph, tf_input_blob)
# get PASCAL VOC classes and colors
pascal_voc_classes, pascal_voc_colors = read_colors_info("test_data/sem_segm/pascal-classes.txt")
# obtain colored segmentation masks
opencv_colored_mask = get_colored_mask(original_img_shape, opencv_prediction, pascal_voc_colors)
tf_colored_mask = get_tf_colored_mask(original_img_shape, tf_prediction, pascal_voc_colors)
# obtain palette of PASCAL VOC colors
color_legend = get_legend(pascal_voc_classes, pascal_voc_colors)
cv2.imshow('TensorFlow Colored Mask', tf_colored_mask)
cv2.imshow('OpenCV DNN Colored Mask', opencv_colored_mask)
cv2.imshow('Color Legend', color_legend)
```
To provide the model inference we will use the below picture from the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) validation dataset:
![PASCAL VOC img](images/2007_000033.jpg)
The target segmented result is:
![PASCAL VOC ground truth](images/2007_000033.png)
For the PASCAL VOC colors decoding and its mapping with the predicted masks, we also need ``pascal-classes.txt`` file, which contains the full list of the PASCAL VOC classes and corresponding colors.
Let's go deeper into each step by the example of pretrained TF DeepLabV3 MobileNetV2:
* read TF ``frozen_inference_graph.pb`` graph :
```python
# init deeplab model graph
model_graph = tf.Graph()
# obtain
with tf.io.gfile.GFile(frozen_graph_path, 'rb') as graph_file:
tf_model_graph = GraphDef()
tf_model_graph.ParseFromString(graph_file.read())
with model_graph.as_default():
tf.import_graph_def(tf_model_graph, name='')
```
* read optimized TF frozen graph with OpenCV API:
```python
# read DeepLab frozen graph with OpenCV API
opencv_net = cv2.dnn.readNetFromTensorflow(opt_deeplab_frozen_graph_path)
```
* prepare input data with cv2.dnn.blobFromImage function:
```python
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)
# preprocess image for TF model input
tf_preproc_img = cv2.resize(input_img, (513, 513))
tf_preproc_img = cv2.cvtColor(tf_preproc_img, cv2.COLOR_BGR2RGB)
# define preprocess parameters for OpenCV DNN
mean = np.array([1.0, 1.0, 1.0]) * 127.5
scale = 1 / 127.5
# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
image=input_img,
scalefactor=scale,
size=(513, 513), # img target size
mean=mean,
swapRB=True, # BGR -> RGB
crop=False # center crop
)
```
Please, pay attention at the preprocessing order in the ``cv2.dnn.blobFromImage`` function. Firstly, the mean value is subtracted and only then pixel values are multiplied by the defined scale.
Therefore, to reproduce TF image preprocessing pipeline, we multiply ``mean`` by ``127.5``.
Another important point is image preprocessing for TF DeepLab. To pass the image into TF model we need only to construct an appropriate shape, the rest image preprocessing is described in [feature_extractor.py](https://github.com/tensorflow/models/blob/master/research/deeplab/core/feature_extractor.py) and will be invoked automatically.
* provide OpenCV ``cv.dnn_Net`` inference:
```python
# set OpenCV DNN input
opencv_net.setInput(preproc_img)
# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN segmentation prediction: \n")
print("* shape: ", out.shape)
# get IDs of predicted classes
out_predictions = np.argmax(out[0], axis=0)
```
After the above code execution we will get the following output:
```
OpenCV DNN segmentation prediction:
* shape: (1, 21, 513, 513)
```
Each prediction channel out of 21, where 21 represents the number of PASCAL VOC classes, contains probabilities, which indicate how likely the pixel corresponds to the PASCAL VOC class.
* provide TF model inference:
```python
preproc_img = np.expand_dims(preproc_img, 0)
# init TF session
tf_session = Session(graph=model_graph)
input_tensor_name = "ImageTensor:0",
output_tensor_name = "SemanticPredictions:0"
# run inference
out = tf_session.run(
output_tensor_name,
feed_dict={input_tensor_name: [preproc_img]}
)
print("TF segmentation model prediction: \n")
print("* shape: ", out.shape)
```
TF inference results are the following:
```
TF segmentation model prediction:
* shape: (1, 513, 513)
```
TensorFlow prediction contains the indexes of corresponding PASCAL VOC classes.
* transform OpenCV prediction into colored mask:
```python
mask_height = segm_mask.shape[0]
mask_width = segm_mask.shape[1]
img_height = original_img_shape[0]
img_width = original_img_shape[1]
# convert mask values into PASCAL VOC colors
processed_mask = np.stack([colors[color_id] for color_id in segm_mask.flatten()])
# reshape mask into 3-channel image
processed_mask = processed_mask.reshape(mask_height, mask_width, 3)
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
np.uint8)
# convert colored mask from BGR to RGB
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
```
In this step we map the probabilities from segmentation masks with appropriate colors of the predicted classes. Let's have a look at the results:
![Color Legend](images/colors_legend.png)
![OpenCV Colored Mask](images/deeplab_opencv_colored_mask.png)
* transform TF prediction into colored mask:
```python
colors = np.array(colors)
processed_mask = colors[segm_mask[0]]
img_height = original_img_shape[0]
img_width = original_img_shape[1]
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
np.uint8)
# convert colored mask from BGR to RGB for compatibility with PASCAL VOC colors
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
```
The result is:
![TF Colored Mask](images/deeplab_tf_colored_mask.png)
As a result, we get two equal segmentation masks.
### Evaluation of the Models
The proposed in ``dnn/samples`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the PASCAL VOC dataset and test execution for the DeepLab MobileNet model.
#### Evaluation Mode
To below line represents running of the module in the evaluation mode:
```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm
```
The model will be read into OpenCV ``cv.dnn_Net`` object. Evaluation results of TF and OpenCV models (pixel accuracy, mean IoU, inference time) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
Necessary evaluation configurations are defined in the [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py):
```python
@dataclass
class TestSegmConfig:
frame_size: int = 500
img_root_dir: str = "./VOC2012"
img_dir: str = os.path.join(img_root_dir, "JPEGImages/")
img_segm_gt_dir: str = os.path.join(img_root_dir, "SegmentationClass/")
# reduced val: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt
segm_val_file: str = os.path.join(img_root_dir, "ImageSets/Segmentation/seg11valid.txt")
colour_file_cls: str = os.path.join(img_root_dir, "ImageSets/Segmentation/pascal-classes.txt")
```
These values can be modified in accordance with chosen model pipeline.
#### Test Mode
The below line represents running of the module in the test mode, which provides the steps for the model inference:
```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm --test True --default_img_preprocess <True/False> --evaluate False
```
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
Test configuration is represented in [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestSegmModuleConfig`` class:
```python
@dataclass
class TestSegmModuleConfig:
segm_test_data_dir: str = "test_data/sem_segm"
test_module_name: str = "segmentation"
test_module_path: str = "segmentation.py"
input_img: str = os.path.join(segm_test_data_dir, "2007_000033.jpg")
model: str = ""
frame_height: str = str(TestSegmConfig.frame_size)
frame_width: str = str(TestSegmConfig.frame_size)
scale: float = 1.0
mean: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
std: List[float] = field(default_factory=list)
crop: bool = False
rgb: bool = True
classes: str = os.path.join(segm_test_data_dir, "pascal-classes.txt")
```
The default image preprocessing options are defined in ``default_preprocess_config.py``:
```python
tf_segm_input_blob = {
"scale": str(1 / 127.5),
"mean": ["127.5", "127.5", "127.5"],
"std": [],
"crop": "False",
"rgb": "True"
}
```
The basis of the model testing is represented in ``samples/dnn/segmentation.py``. ``segmentation.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for ``cv2.dnn.blobFromImage``.
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm --test True --default_img_preprocess True --evaluate False
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

View File

@@ -0,0 +1,324 @@
# High Level API: TextDetectionModel and TextRecognitionModel {#tutorial_dnn_text_spotting}
@tableofcontents
@prev_tutorial{tutorial_dnn_OCR}
@next_tutorial{pytorch_cls_tutorial_dnn_conversion}
| | |
| -: | :- |
| Original author | Wenqing Zhang |
| Compatibility | OpenCV >= 4.5 |
## Introduction
In this tutorial, we will introduce the APIs for TextRecognitionModel and TextDetectionModel in detail.
---
#### TextRecognitionModel:
In the current version, @ref cv::dnn::TextRecognitionModel only supports CNN+RNN+CTC based algorithms,
and the greedy decoding method for CTC is provided.
For more information, please refer to the [original paper](https://arxiv.org/abs/1507.05717)
Before recognition, you should `setVocabulary` and `setDecodeType`.
- "CTC-greedy", the output of the text recognition model should be a probability matrix.
The shape should be `(T, B, Dim)`, where
- `T` is the sequence length
- `B` is the batch size (only support `B=1` in inference)
- and `Dim` is the length of vocabulary +1('Blank' of CTC is at the index=0 of Dim).
@ref cv::dnn::TextRecognitionModel::recognize() is the main function for text recognition.
- The input image should be a cropped text image or an image with `roiRects`
- Other decoding methods may supported in the future
---
#### TextDetectionModel:
@ref cv::dnn::TextDetectionModel API provides these methods for text detection:
- cv::dnn::TextDetectionModel::detect() returns the results in std::vector<std::vector<Point>> (4-points quadrangles)
- cv::dnn::TextDetectionModel::detectTextRectangles() returns the results in std::vector<cv::RotatedRect> (RBOX-like)
In the current version, @ref cv::dnn::TextDetectionModel supports these algorithms:
- use @ref cv::dnn::TextDetectionModel_DB with "DB" models
- and use @ref cv::dnn::TextDetectionModel_EAST with "EAST" models
The following provided pretrained models are variants of DB (w/o deformable convolution),
and the performance can be referred to the Table.1 in the [paper]((https://arxiv.org/abs/1911.08947)).
For more information, please refer to the [official code](https://github.com/MhLiao/DB)
---
You can train your own model with more data, and convert it into ONNX format.
We encourage you to add new algorithms to these APIs.
## Pretrained Models
#### TextRecognitionModel:
```
crnn.onnx:
url: https://drive.google.com/uc?export=dowload&id=1ooaLR-rkTl8jdpGy1DoQs0-X0lQsB6Fj
sha: 270d92c9ccb670ada2459a25977e8deeaf8380d3,
alphabet_36.txt: https://drive.google.com/uc?export=dowload&id=1oPOYx5rQRp8L6XQciUwmwhMCfX0KyO4b
parameter setting: -rgb=0;
description: The classification number of this model is 36 (0~9 + a~z).
The training dataset is MJSynth.
crnn_cs.onnx:
url: https://drive.google.com/uc?export=dowload&id=12diBsVJrS9ZEl6BNUiRp9s0xPALBS7kt
sha: a641e9c57a5147546f7a2dbea4fd322b47197cd5
alphabet_94.txt: https://drive.google.com/uc?export=dowload&id=1oKXxXKusquimp7XY1mFvj9nwLzldVgBR
parameter setting: -rgb=1;
description: The classification number of this model is 94 (0~9 + a~z + A~Z + punctuations).
The training datasets are MJsynth and SynthText.
crnn_cs_CN.onnx:
url: https://drive.google.com/uc?export=dowload&id=1is4eYEUKH7HR7Gl37Sw4WPXx6Ir8oQEG
sha: 3940942b85761c7f240494cf662dcbf05dc00d14
alphabet_3944.txt: https://drive.google.com/uc?export=dowload&id=18IZUUdNzJ44heWTndDO6NNfIpJMmN-ul
parameter setting: -rgb=1;
description: The classification number of this model is 3944 (0~9 + a~z + A~Z + Chinese characters + special characters).
The training dataset is ReCTS (https://rrc.cvc.uab.es/?ch=12).
```
More models can be found in [here](https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing),
which are taken from [clovaai](https://github.com/clovaai/deep-text-recognition-benchmark).
You can train more models by [CRNN](https://github.com/meijieru/crnn.pytorch), and convert models by `torch.onnx.export`.
#### TextDetectionModel:
```
- DB_IC15_resnet50.onnx:
url: https://drive.google.com/uc?export=dowload&id=17_ABp79PlFt9yPCxSaarVc_DKTmrSGGf
sha: bef233c28947ef6ec8c663d20a2b326302421fa3
recommended parameter setting: -inputHeight=736, -inputWidth=1280;
description: This model is trained on ICDAR2015, so it can only detect English text instances.
- DB_IC15_resnet18.onnx:
url: https://drive.google.com/uc?export=dowload&id=1sZszH3pEt8hliyBlTmB-iulxHP1dCQWV
sha: 19543ce09b2efd35f49705c235cc46d0e22df30b
recommended parameter setting: -inputHeight=736, -inputWidth=1280;
description: This model is trained on ICDAR2015, so it can only detect English text instances.
- DB_TD500_resnet50.onnx:
url: https://drive.google.com/uc?export=dowload&id=19YWhArrNccaoSza0CfkXlA8im4-lAGsR
sha: 1b4dd21a6baa5e3523156776970895bd3db6960a
recommended parameter setting: -inputHeight=736, -inputWidth=736;
description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances.
- DB_TD500_resnet18.onnx:
url: https://drive.google.com/uc?export=dowload&id=1vY_KsDZZZb_svd5RT6pjyI8BS1nPbBSX
sha: 8a3700bdc13e00336a815fc7afff5dcc1ce08546
recommended parameter setting: -inputHeight=736, -inputWidth=736;
description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances.
```
We will release more models of DB [here](https://drive.google.com/drive/folders/1qzNCHfUJOS0NEUOIKn69eCtxdlNPpWbq?usp=sharing) in the future.
```
- EAST:
Download link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
This model is based on https://github.com/argman/EAST
```
## Images for Testing
```
Text Recognition:
url: https://drive.google.com/uc?export=dowload&id=1nMcEy68zDNpIlqAn6xCk_kYcUTIeSOtN
sha: 89205612ce8dd2251effa16609342b69bff67ca3
Text Detection:
url: https://drive.google.com/uc?export=dowload&id=149tAhIcvfCYeyufRoZ9tmc2mZDKE_XrF
sha: ced3c03fb7f8d9608169a913acf7e7b93e07109b
```
## Example for Text Recognition
Step1. Loading images and models with a vocabulary
```cpp
// Load a cropped text line image
// you can find cropped images for testing in "Images for Testing"
int rgb = IMREAD_COLOR; // This should be changed according to the model input requirement.
Mat image = imread("path/to/text_rec_test.png", rgb);
// Load models weights
TextRecognitionModel model("path/to/crnn_cs.onnx");
// The decoding method
// more methods will be supported in future
model.setDecodeType("CTC-greedy");
// Load vocabulary
// vocabulary should be changed according to the text recognition model
std::ifstream vocFile;
vocFile.open("path/to/alphabet_94.txt");
CV_Assert(vocFile.is_open());
String vocLine;
std::vector<String> vocabulary;
while (std::getline(vocFile, vocLine)) {
vocabulary.push_back(vocLine);
}
model.setVocabulary(vocabulary);
```
Step2. Setting Parameters
```cpp
// Normalization parameters
double scale = 1.0 / 127.5;
Scalar mean = Scalar(127.5, 127.5, 127.5);
// The input shape
Size inputSize = Size(100, 32);
model.setInputParams(scale, inputSize, mean);
```
Step3. Inference
```cpp
std::string recognitionResult = recognizer.recognize(image);
std::cout << "'" << recognitionResult << "'" << std::endl;
```
Input image:
![Picture example](text_rec_test.png)
Output:
```
'welcome'
```
## Example for Text Detection
Step1. Loading images and models
```cpp
// Load an image
// you can find some images for testing in "Images for Testing"
Mat frame = imread("/path/to/text_det_test.png");
```
Step2.a Setting Parameters (DB)
```cpp
// Load model weights
TextDetectionModel_DB model("/path/to/DB_TD500_resnet50.onnx");
// Post-processing parameters
float binThresh = 0.3;
float polyThresh = 0.5;
uint maxCandidates = 200;
double unclipRatio = 2.0;
model.setBinaryThreshold(binThresh)
.setPolygonThreshold(polyThresh)
.setMaxCandidates(maxCandidates)
.setUnclipRatio(unclipRatio)
;
// Normalization parameters
double scale = 1.0 / 255.0;
Scalar mean = Scalar(122.67891434, 116.66876762, 104.00698793);
// The input shape
Size inputSize = Size(736, 736);
model.setInputParams(scale, inputSize, mean);
```
Step2.b Setting Parameters (EAST)
```cpp
TextDetectionModel_EAST model("EAST.pb");
float confThreshold = 0.5;
float nmsThreshold = 0.4;
model.setConfidenceThreshold(confThresh)
.setNMSThreshold(nmsThresh)
;
double detScale = 1.0;
Size detInputSize = Size(320, 320);
Scalar detMean = Scalar(123.68, 116.78, 103.94);
bool swapRB = true;
model.setInputParams(detScale, detInputSize, detMean, swapRB);
```
Step3. Inference
```cpp
std::vector<std::vector<Point>> detResults;
model.detect(detResults);
// Visualization
polylines(frame, results, true, Scalar(0, 255, 0), 2);
imshow("Text Detection", image);
waitKey();
```
Output:
![Picture example](text_det_test_results.jpg)
## Example for Text Spotting
After following the steps above, it is easy to get the detection results of an input image.
Then, you can do transformation and crop text images for recognition.
For more information, please refer to **Detailed Sample**
```cpp
// Transform and Crop
Mat cropped;
fourPointsTransform(recInput, vertices, cropped);
String recResult = recognizer.recognize(cropped);
```
Output Examples:
![Picture example](detect_test1.jpg)
![Picture example](detect_test2.jpg)
## Source Code
The [source code](https://github.com/opencv/opencv/blob/master/modules/dnn/src/model.cpp)
of these APIs can be found in the DNN module.
## Detailed Sample
For more information, please refer to:
- [samples/dnn/scene_text_recognition.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_recognition.cpp)
- [samples/dnn/scene_text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_detection.cpp)
- [samples/dnn/text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp)
- [samples/dnn/scene_text_spotting.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_spotting.cpp)
#### Test with an image
Examples:
```bash
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=/path/to/alphabet_94.txt
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -i=path/to/an/image -ih=736 -iw=736
example_dnn_scene_text_spotting -dmp=path/to/DB_IC15_resnet50.onnx -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -iw=1280 -ih=736 -rgb=1 -vp=/path/to/alphabet_94.txt
example_dnn_text_detection -dmp=path/to/EAST.pb -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=path/to/alphabet_94.txt
```
#### Test on public datasets
Text Recognition:
The download link for testing images can be found in the **Images for Testing**
Examples:
```bash
example_dnn_scene_text_recognition -mp=path/to/crnn.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_36.txt -rgb=0
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_94.txt -rgb=1
```
Text Detection:
The download links for testing images can be found in the **Images for Testing**
Examples:
```bash
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/TD500 -ih=736 -iw=736
example_dnn_scene_text_detection -mp=path/to/DB_IC15_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/IC15 -ih=736 -iw=1280
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

View File

@@ -0,0 +1,54 @@
YOLO DNNs {#tutorial_dnn_yolo}
===============================
@tableofcontents
@prev_tutorial{tutorial_dnn_android}
@next_tutorial{tutorial_dnn_javascript}
| | |
| -: | :- |
| Original author | Alessandro de Oliveira Faria |
| Compatibility | OpenCV >= 3.3.1 |
Introduction
------------
In this text you will learn how to use opencv_dnn module using yolo_object_detection (Sample of using OpenCV dnn module in real time with device capture, video and image).
We will demonstrate results of this example on the following picture.
![Picture example](images/yolo.jpg)
Examples
--------
VIDEO DEMO:
@youtube{NHtRlndE2cg}
Source Code
-----------
Use a universal sample for object detection models written
[in C++](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp) and
[in Python](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py) languages
Usage examples
--------------
Execute in webcam:
@code{.bash}
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --rgb
@endcode
Execute with image or video file:
@code{.bash}
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --input=[PATH-TO-IMAGE-OR-VIDEO-FILE] --rgb
@endcode
Questions and suggestions email to: Alessandro de Oliveira Faria cabelo@opensuse.org or OpenCV Team.

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@@ -0,0 +1,24 @@
Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
=====================================
- @subpage tutorial_dnn_googlenet
- @subpage tutorial_dnn_halide
- @subpage tutorial_dnn_halide_scheduling
- @subpage tutorial_dnn_android
- @subpage tutorial_dnn_yolo
- @subpage tutorial_dnn_javascript
- @subpage tutorial_dnn_custom_layers
- @subpage tutorial_dnn_OCR
- @subpage tutorial_dnn_text_spotting
#### PyTorch models with OpenCV
In this section you will find the guides, which describe how to run classification, segmentation and detection PyTorch DNN models with OpenCV.
- @subpage pytorch_cls_tutorial_dnn_conversion
- @subpage pytorch_cls_c_tutorial_dnn_conversion
- @subpage pytorch_segm_tutorial_dnn_conversion
#### TensorFlow models with OpenCV
In this section you will find the guides, which describe how to run classification, segmentation and detection TensorFlow DNN models with OpenCV.
- @subpage tf_cls_tutorial_dnn_conversion
- @subpage tf_det_tutorial_dnn_conversion
- @subpage tf_segm_tutorial_dnn_conversion