init - 初始化项目
54
doc/tutorials/dnn/dnn_OCR/dnn_OCR.markdown
Normal file
@@ -0,0 +1,54 @@
|
||||
# How to run custom OCR model {#tutorial_dnn_OCR}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_custom_layers}
|
||||
@next_tutorial{tutorial_dnn_text_spotting}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Zihao Mu |
|
||||
| Compatibility | OpenCV >= 4.3 |
|
||||
|
||||
## Introduction
|
||||
|
||||
In this tutorial, we first introduce how to obtain the custom OCR model, then how to transform your own OCR models so that they can be run correctly by the opencv_dnn module. and finally we will provide some pre-trained models.
|
||||
|
||||
## Train your own OCR model
|
||||
|
||||
[This repository](https://github.com/zihaomu/deep-text-recognition-benchmark) is a good start point for training your own OCR model. In repository, the MJSynth+SynthText was set as training set by default. In addition, you can configure the model structure and data set you want.
|
||||
|
||||
## Transform OCR model to ONNX format and Use it in OpenCV DNN
|
||||
|
||||
After completing the model training, please use [transform_to_onnx.py](https://github.com/zihaomu/deep-text-recognition-benchmark/blob/master/transform_to_onnx.py) to convert the model into onnx format.
|
||||
|
||||
#### Execute in webcam
|
||||
The Python version example code can be found at [here](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.py).
|
||||
|
||||
Example:
|
||||
@code{.bash}
|
||||
$ text_detection -m=[path_to_text_detect_model] -ocr=[path_to_text_recognition_model]
|
||||
@endcode
|
||||
|
||||
## Pre-trained ONNX models are provided
|
||||
|
||||
Some pre-trained models can be found at https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing.
|
||||
|
||||
Their performance at different text recognition datasets is shown in the table below:
|
||||
|
||||
| Model name | IIIT5k(%) | SVT(%) | ICDAR03(%) | ICDAR13(%) | ICDAR15(%) | SVTP(%) | CUTE80(%) | average acc (%) | parameter( x10^6 ) |
|
||||
| -------------------- | --------- | ------ | ---------- | ---------- | ---------- | ------- | --------- | --------------- | ------------------ |
|
||||
| DenseNet-CTC | 72.267 | 67.39 | 82.81 | 80 | 48.38 | 49.45 | 42.50 | 63.26 | 0.24 |
|
||||
| DenseNet-BiLSTM-CTC | 73.76 | 72.33 | 86.15 | 83.15 | 50.67 | 57.984 | 49.826 | 67.69 | 3.63 |
|
||||
| VGG-CTC | 75.96 | 75.42 | 85.92 | 83.54 | 54.89 | 57.52 | 50.17 | 69.06 | 5.57 |
|
||||
| CRNN_VGG-BiLSTM-CTC | 82.63 | 82.07 | 92.96 | 88.867 | 66.28 | 71.01 | 62.37 | 78.03 | 8.45 |
|
||||
| ResNet-CTC | 84.00 | 84.08 | 92.39 | 88.96 | 67.74 | 74.73 | 67.60 | 79.93 | 44.28 |
|
||||
|
||||
The performance of the text recognition model were tesred on OpenCV DNN, and does not include the text detection model.
|
||||
|
||||
#### Model selection suggestion:
|
||||
The input of text recognition model is the output of the text detection model, which causes the performance of text detection to greatly affect the performance of text recognition.
|
||||
|
||||
DenseNet_CTC has the smallest parameters and best FPS, and it is suitable for edge devices, which are very sensitive to the cost of calculation. If you have limited computing resources and want to achieve better accuracy, VGG_CTC is a good choice.
|
||||
|
||||
CRNN_VGG_BiLSTM_CTC is suitable for scenarios that require high recognition accuracy.
|
||||
BIN
doc/tutorials/dnn/dnn_android/10_opencv_dependency.png
Normal file
|
After Width: | Height: | Size: 15 KiB |
BIN
doc/tutorials/dnn/dnn_android/11_demo.jpg
Normal file
|
After Width: | Height: | Size: 118 KiB |
BIN
doc/tutorials/dnn/dnn_android/1_start_new_project.png
Normal file
|
After Width: | Height: | Size: 41 KiB |
BIN
doc/tutorials/dnn/dnn_android/2_start_new_project.png
Normal file
|
After Width: | Height: | Size: 55 KiB |
BIN
doc/tutorials/dnn/dnn_android/3_start_new_project.png
Normal file
|
After Width: | Height: | Size: 34 KiB |
BIN
doc/tutorials/dnn/dnn_android/4_start_new_project.png
Normal file
|
After Width: | Height: | Size: 37 KiB |
BIN
doc/tutorials/dnn/dnn_android/5_setup.png
Normal file
|
After Width: | Height: | Size: 5.6 KiB |
BIN
doc/tutorials/dnn/dnn_android/6_run_empty_project.png
Normal file
|
After Width: | Height: | Size: 9.5 KiB |
BIN
doc/tutorials/dnn/dnn_android/7_import_module.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
doc/tutorials/dnn/dnn_android/8_import_module.png
Normal file
|
After Width: | Height: | Size: 52 KiB |
BIN
doc/tutorials/dnn/dnn_android/9_opencv_dependency.png
Normal file
|
After Width: | Height: | Size: 56 KiB |
107
doc/tutorials/dnn/dnn_android/dnn_android.markdown
Normal file
@@ -0,0 +1,107 @@
|
||||
# How to run deep networks on Android device {#tutorial_dnn_android}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_halide_scheduling}
|
||||
@next_tutorial{tutorial_dnn_yolo}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Dmitry Kurtaev |
|
||||
| Compatibility | OpenCV >= 3.3 |
|
||||
|
||||
## Introduction
|
||||
In this tutorial you'll know how to run deep learning networks on Android device
|
||||
using OpenCV deep learning module.
|
||||
|
||||
Tutorial was written for the following versions of corresponding software:
|
||||
- Android Studio 2.3.3
|
||||
- OpenCV 3.3.0+
|
||||
|
||||
## Requirements
|
||||
|
||||
- Download and install Android Studio from https://developer.android.com/studio.
|
||||
|
||||
- Get the latest pre-built OpenCV for Android release from https://github.com/opencv/opencv/releases and unpack it (for example, `opencv-4.X.Y-android-sdk.zip`).
|
||||
|
||||
- Download MobileNet object detection model from https://github.com/chuanqi305/MobileNet-SSD. We need a configuration file `MobileNetSSD_deploy.prototxt` and weights `MobileNetSSD_deploy.caffemodel`.
|
||||
|
||||
## Create an empty Android Studio project
|
||||
- Open Android Studio. Start a new project. Let's call it `opencv_mobilenet`.
|
||||

|
||||
|
||||
- Keep default target settings.
|
||||

|
||||
|
||||
- Use "Empty Activity" template. Name activity as `MainActivity` with a
|
||||
corresponding layout `activity_main`.
|
||||

|
||||
|
||||

|
||||
|
||||
- Wait until a project was created. Go to `Run->Edit Configurations`.
|
||||
Choose `USB Device` as target device for runs.
|
||||

|
||||
Plug in your device and run the project. It should be installed and launched
|
||||
successfully before we'll go next.
|
||||
@note Read @ref tutorial_android_dev_intro in case of problems.
|
||||
|
||||

|
||||
|
||||
## Add OpenCV dependency
|
||||
|
||||
- Go to `File->New->Import module` and provide a path to `unpacked_OpenCV_package/sdk/java`. The name of module detects automatically.
|
||||
Disable all features that Android Studio will suggest you on the next window.
|
||||

|
||||
|
||||

|
||||
|
||||
- Open two files:
|
||||
|
||||
1. `AndroidStudioProjects/opencv_mobilenet/app/build.gradle`
|
||||
|
||||
2. `AndroidStudioProjects/opencv_mobilenet/openCVLibrary330/build.gradle`
|
||||
|
||||
Copy both `compileSdkVersion` and `buildToolsVersion` from the first file to
|
||||
the second one.
|
||||
|
||||
`compileSdkVersion 14` -> `compileSdkVersion 26`
|
||||
|
||||
`buildToolsVersion "25.0.0"` -> `buildToolsVersion "26.0.1"`
|
||||
|
||||
- Make the project. There is no errors should be at this point.
|
||||
|
||||
- Go to `File->Project Structure`. Add OpenCV module dependency.
|
||||

|
||||
|
||||

|
||||
|
||||
- Install once an appropriate OpenCV manager from `unpacked_OpenCV_package/apk`
|
||||
to target device.
|
||||
@code
|
||||
adb install OpenCV_3.3.0_Manager_3.30_armeabi-v7a.apk
|
||||
@endcode
|
||||
|
||||
- Congratulations! We're ready now to make a sample using OpenCV.
|
||||
|
||||
## Make a sample
|
||||
Our sample will takes pictures from a camera, forwards it into a deep network and
|
||||
receives a set of rectangles, class identifiers and confidence values in `[0, 1]`
|
||||
range.
|
||||
|
||||
- First of all, we need to add a necessary widget which displays processed
|
||||
frames. Modify `app/src/main/res/layout/activity_main.xml`:
|
||||
@include android/mobilenet-objdetect/res/layout/activity_main.xml
|
||||
|
||||
- Put downloaded `MobileNetSSD_deploy.prototxt` and `MobileNetSSD_deploy.caffemodel`
|
||||
into `app/build/intermediates/assets/debug` folder.
|
||||
|
||||
- Modify `/app/src/main/AndroidManifest.xml` to enable full-screen mode, set up
|
||||
a correct screen orientation and allow to use a camera.
|
||||
@include android/mobilenet-objdetect/gradle/AndroidManifest.xml
|
||||
|
||||
- Replace content of `app/src/main/java/org/opencv/samples/opencv_mobilenet/MainActivity.java`:
|
||||
@include android/mobilenet-objdetect/src/org/opencv/samples/opencv_mobilenet/MainActivity.java
|
||||
|
||||
- Launch an application and make a fun!
|
||||

|
||||
236
doc/tutorials/dnn/dnn_custom_layers/dnn_custom_layers.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Custom deep learning layers support {#tutorial_dnn_custom_layers}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_javascript}
|
||||
@next_tutorial{tutorial_dnn_OCR}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Dmitry Kurtaev |
|
||||
| Compatibility | OpenCV >= 3.4.1 |
|
||||
|
||||
## Introduction
|
||||
Deep learning is a fast growing area. The new approaches to build neural networks
|
||||
usually introduce new types of layers. They could be modifications of existing
|
||||
ones or implement outstanding researching ideas.
|
||||
|
||||
OpenCV gives an opportunity to import and run networks from different deep learning
|
||||
frameworks. There are a number of the most popular layers. However you can face
|
||||
a problem that your network cannot be imported using OpenCV because of unimplemented layers.
|
||||
|
||||
The first solution is to create a feature request at https://github.com/opencv/opencv/issues
|
||||
mentioning details such a source of model and type of new layer. A new layer could
|
||||
be implemented if OpenCV community shares this need.
|
||||
|
||||
The second way is to define a **custom layer** so OpenCV's deep learning engine
|
||||
will know how to use it. This tutorial is dedicated to show you a process of deep
|
||||
learning models import customization.
|
||||
|
||||
## Define a custom layer in C++
|
||||
Deep learning layer is a building block of network's pipeline.
|
||||
It has connections to **input blobs** and produces results to **output blobs**.
|
||||
There are trained **weights** and **hyper-parameters**.
|
||||
Layers' names, types, weights and hyper-parameters are stored in files are generated by
|
||||
native frameworks during training. If OpenCV mets unknown layer type it throws an
|
||||
exception trying to read a model:
|
||||
|
||||
```
|
||||
Unspecified error: Can't create layer "layer_name" of type "MyType" in function getLayerInstance
|
||||
```
|
||||
|
||||
To import the model correctly you have to derive a class from cv::dnn::Layer with
|
||||
the following methods:
|
||||
|
||||
@snippet dnn/custom_layers.hpp A custom layer interface
|
||||
|
||||
And register it before the import:
|
||||
|
||||
@snippet dnn/custom_layers.hpp Register a custom layer
|
||||
|
||||
@note `MyType` is a type of unimplemented layer from the thrown exception.
|
||||
|
||||
Let's see what all the methods do:
|
||||
|
||||
- Constructor
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::MyLayer
|
||||
|
||||
Retrieves hyper-parameters from cv::dnn::LayerParams. If your layer has trainable
|
||||
weights they will be already stored in the Layer's member cv::dnn::Layer::blobs.
|
||||
|
||||
- A static method `create`
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::create
|
||||
|
||||
This method should create an instance of you layer and return cv::Ptr with it.
|
||||
|
||||
- Output blobs' shape computation
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::getMemoryShapes
|
||||
|
||||
Returns layer's output shapes depends on input shapes. You may request an extra
|
||||
memory using `internals`.
|
||||
|
||||
- Run a layer
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::forward
|
||||
|
||||
Implement a layer's logic here. Compute outputs for given inputs.
|
||||
|
||||
@note OpenCV manages memory allocated for layers. In the most cases the same memory
|
||||
can be reused between layers. So your `forward` implementation should not rely that
|
||||
the second invocation of `forward` will has the same data at `outputs` and `internals`.
|
||||
|
||||
- Optional `finalize` method
|
||||
|
||||
@snippet dnn/custom_layers.hpp MyLayer::finalize
|
||||
|
||||
The chain of methods are the following: OpenCV deep learning engine calls `create`
|
||||
method once then it calls `getMemoryShapes` for an every created layer then you
|
||||
can make some preparations depends on known input dimensions at cv::dnn::Layer::finalize.
|
||||
After network was initialized only `forward` method is called for an every network's input.
|
||||
|
||||
@note Varying input blobs' sizes such height or width or batch size you make OpenCV
|
||||
reallocate all the internal memory. That leads efficiency gaps. Try to initialize
|
||||
and deploy models using a fixed batch size and image's dimensions.
|
||||
|
||||
## Example: custom layer from Caffe
|
||||
Let's create a custom layer `Interp` from https://github.com/cdmh/deeplab-public.
|
||||
It's just a simple resize that takes an input blob of size `N x C x Hi x Wi` and returns
|
||||
an output blob of size `N x C x Ho x Wo` where `N` is a batch size, `C` is a number of channels,
|
||||
`Hi x Wi` and `Ho x Wo` are input and output `height x width` correspondingly.
|
||||
This layer has no trainable weights but it has hyper-parameters to specify an output size.
|
||||
|
||||
In example,
|
||||
~~~~~~~~~~~~~
|
||||
layer {
|
||||
name: "output"
|
||||
type: "Interp"
|
||||
bottom: "input"
|
||||
top: "output"
|
||||
interp_param {
|
||||
height: 9
|
||||
width: 8
|
||||
}
|
||||
}
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
This way our implementation can look like:
|
||||
|
||||
@snippet dnn/custom_layers.hpp InterpLayer
|
||||
|
||||
Next we need to register a new layer type and try to import the model.
|
||||
|
||||
@snippet dnn/custom_layers.hpp Register InterpLayer
|
||||
|
||||
## Example: custom layer from TensorFlow
|
||||
This is an example of how to import a network with [tf.image.resize_bilinear](https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_bilinear)
|
||||
operation. This is also a resize but with an implementation different from OpenCV's or `Interp` above.
|
||||
|
||||
Let's create a single layer network:
|
||||
~~~~~~~~~~~~~{.py}
|
||||
inp = tf.placeholder(tf.float32, [2, 3, 4, 5], 'input')
|
||||
resized = tf.image.resize_bilinear(inp, size=[9, 8], name='resize_bilinear')
|
||||
~~~~~~~~~~~~~
|
||||
OpenCV sees that TensorFlow's graph in the following way:
|
||||
|
||||
```
|
||||
node {
|
||||
name: "input"
|
||||
op: "Placeholder"
|
||||
attr {
|
||||
key: "dtype"
|
||||
value {
|
||||
type: DT_FLOAT
|
||||
}
|
||||
}
|
||||
}
|
||||
node {
|
||||
name: "resize_bilinear/size"
|
||||
op: "Const"
|
||||
attr {
|
||||
key: "dtype"
|
||||
value {
|
||||
type: DT_INT32
|
||||
}
|
||||
}
|
||||
attr {
|
||||
key: "value"
|
||||
value {
|
||||
tensor {
|
||||
dtype: DT_INT32
|
||||
tensor_shape {
|
||||
dim {
|
||||
size: 2
|
||||
}
|
||||
}
|
||||
tensor_content: "\t\000\000\000\010\000\000\000"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
node {
|
||||
name: "resize_bilinear"
|
||||
op: "ResizeBilinear"
|
||||
input: "input:0"
|
||||
input: "resize_bilinear/size"
|
||||
attr {
|
||||
key: "T"
|
||||
value {
|
||||
type: DT_FLOAT
|
||||
}
|
||||
}
|
||||
attr {
|
||||
key: "align_corners"
|
||||
value {
|
||||
b: false
|
||||
}
|
||||
}
|
||||
}
|
||||
library {
|
||||
}
|
||||
```
|
||||
Custom layers import from TensorFlow is designed to put all layer's `attr` into
|
||||
cv::dnn::LayerParams but input `Const` blobs into cv::dnn::Layer::blobs.
|
||||
In our case resize's output shape will be stored in layer's `blobs[0]`.
|
||||
|
||||
@snippet dnn/custom_layers.hpp ResizeBilinearLayer
|
||||
|
||||
Next we register a layer and try to import the model.
|
||||
|
||||
@snippet dnn/custom_layers.hpp Register ResizeBilinearLayer
|
||||
|
||||
## Define a custom layer in Python
|
||||
The following example shows how to customize OpenCV's layers in Python.
|
||||
|
||||
Let's consider [Holistically-Nested Edge Detection](https://arxiv.org/abs/1504.06375)
|
||||
deep learning model. That was trained with one and only difference comparing to
|
||||
a current version of [Caffe framework](http://caffe.berkeleyvision.org/). `Crop`
|
||||
layers that receive two input blobs and crop the first one to match spatial dimensions
|
||||
of the second one used to crop from the center. Nowadays Caffe's layer does it
|
||||
from the top-left corner. So using the latest version of Caffe or OpenCV you'll
|
||||
get shifted results with filled borders.
|
||||
|
||||
Next we're going to replace OpenCV's `Crop` layer that makes top-left cropping by
|
||||
a centric one.
|
||||
|
||||
- Create a class with `getMemoryShapes` and `forward` methods
|
||||
|
||||
@snippet dnn/edge_detection.py CropLayer
|
||||
|
||||
@note Both methods should return lists.
|
||||
|
||||
- Register a new layer.
|
||||
|
||||
@snippet dnn/edge_detection.py Register
|
||||
|
||||
That's it! We've replaced an implemented OpenCV's layer to a custom one.
|
||||
You may find a full script in the [source code](https://github.com/opencv/opencv/tree/master/samples/dnn/edge_detection.py).
|
||||
|
||||
<table border="0">
|
||||
<tr>
|
||||
<td></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</table>
|
||||
74
doc/tutorials/dnn/dnn_googlenet/dnn_googlenet.markdown
Normal file
@@ -0,0 +1,74 @@
|
||||
Load Caffe framework models {#tutorial_dnn_googlenet}
|
||||
===========================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@next_tutorial{tutorial_dnn_halide}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Vitaliy Lyudvichenko |
|
||||
| Compatibility | OpenCV >= 3.3 |
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
In this tutorial you will learn how to use opencv_dnn module for image classification by using
|
||||
GoogLeNet trained network from [Caffe model zoo](http://caffe.berkeleyvision.org/model_zoo.html).
|
||||
|
||||
We will demonstrate results of this example on the following picture.
|
||||

|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
We will be using snippets from the example application, that can be downloaded [here](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.cpp).
|
||||
|
||||
@include dnn/classification.cpp
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
-# Firstly, download GoogLeNet model files:
|
||||
[bvlc_googlenet.prototxt ](https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/bvlc_googlenet.prototxt) and
|
||||
[bvlc_googlenet.caffemodel](http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel)
|
||||
|
||||
Also you need file with names of [ILSVRC2012](http://image-net.org/challenges/LSVRC/2012/browse-synsets) classes:
|
||||
[classification_classes_ILSVRC2012.txt](https://github.com/opencv/opencv/blob/master/samples/data/dnn/classification_classes_ILSVRC2012.txt).
|
||||
|
||||
Put these files into working dir of this program example.
|
||||
|
||||
-# Read and initialize network using path to .prototxt and .caffemodel files
|
||||
@snippet dnn/classification.cpp Read and initialize network
|
||||
|
||||
You can skip an argument `framework` if one of the files `model` or `config` has an
|
||||
extension `.caffemodel` or `.prototxt`.
|
||||
This way function cv::dnn::readNet can automatically detects a model's format.
|
||||
|
||||
-# Read input image and convert to the blob, acceptable by GoogleNet
|
||||
@snippet dnn/classification.cpp Open a video file or an image file or a camera stream
|
||||
|
||||
cv::VideoCapture can load both images and videos.
|
||||
|
||||
@snippet dnn/classification.cpp Create a 4D blob from a frame
|
||||
We convert the image to a 4-dimensional blob (so-called batch) with `1x3x224x224` shape
|
||||
after applying necessary pre-processing like resizing and mean subtraction
|
||||
`(-104, -117, -123)` for each blue, green and red channels correspondingly using cv::dnn::blobFromImage function.
|
||||
|
||||
-# Pass the blob to the network
|
||||
@snippet dnn/classification.cpp Set input blob
|
||||
|
||||
-# Make forward pass
|
||||
@snippet dnn/classification.cpp Make forward pass
|
||||
During the forward pass output of each network layer is computed, but in this example we need output from the last layer only.
|
||||
|
||||
-# Determine the best class
|
||||
@snippet dnn/classification.cpp Get a class with a highest score
|
||||
We put the output of network, which contain probabilities for each of 1000 ILSVRC2012 image classes, to the `prob` blob.
|
||||
And find the index of element with maximal value in this one. This index corresponds to the class of the image.
|
||||
|
||||
-# Run an example from command line
|
||||
@code
|
||||
./example_dnn_classification --model=bvlc_googlenet.caffemodel --config=bvlc_googlenet.prototxt --width=224 --height=224 --classes=classification_classes_ILSVRC2012.txt --input=space_shuttle.jpg --mean="104 117 123"
|
||||
@endcode
|
||||
For our image we get prediction of class `space shuttle` with more than 99% sureness.
|
||||
88
doc/tutorials/dnn/dnn_halide/dnn_halide.markdown
Normal file
@@ -0,0 +1,88 @@
|
||||
# How to enable Halide backend for improve efficiency {#tutorial_dnn_halide}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_googlenet}
|
||||
@next_tutorial{tutorial_dnn_halide_scheduling}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Dmitry Kurtaev |
|
||||
| Compatibility | OpenCV >= 3.3 |
|
||||
|
||||
## Introduction
|
||||
This tutorial guidelines how to run your models in OpenCV deep learning module
|
||||
using Halide language backend. Halide is an open-source project that let us
|
||||
write image processing algorithms in well-readable format, schedule computations
|
||||
according to specific device and evaluate it with a quite good efficiency.
|
||||
|
||||
An official website of the Halide project: http://halide-lang.org/.
|
||||
|
||||
An up to date efficiency comparison: https://github.com/opencv/opencv/wiki/DNN-Efficiency
|
||||
|
||||
## Requirements
|
||||
### LLVM compiler
|
||||
|
||||
@note LLVM compilation might take a long time.
|
||||
|
||||
- Download LLVM source code from http://releases.llvm.org/4.0.0/llvm-4.0.0.src.tar.xz.
|
||||
Unpack it. Let **llvm_root** is a root directory of source code.
|
||||
|
||||
- Create directory **llvm_root**/tools/clang
|
||||
|
||||
- Download Clang with the same version as LLVM. In our case it will be from
|
||||
http://releases.llvm.org/4.0.0/cfe-4.0.0.src.tar.xz. Unpack it into
|
||||
**llvm_root**/tools/clang. Note that it should be a root for Clang source code.
|
||||
|
||||
- Build LLVM on Linux
|
||||
@code
|
||||
cd llvm_root
|
||||
mkdir build && cd build
|
||||
cmake -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release ..
|
||||
make -j4
|
||||
@endcode
|
||||
|
||||
- Build LLVM on Windows (Developer Command Prompt)
|
||||
@code
|
||||
mkdir \\path-to-llvm-build\\ && cd \\path-to-llvm-build\\
|
||||
cmake.exe -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=\\path-to-llvm-install\\ -G "Visual Studio 14 Win64" \\path-to-llvm-src\\
|
||||
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\INSTALL.vcxproj
|
||||
@endcode
|
||||
|
||||
@note `\\path-to-llvm-build\\` and `\\path-to-llvm-install\\` are different directories.
|
||||
|
||||
### Halide language.
|
||||
|
||||
- Download source code from GitHub repository, https://github.com/halide/Halide
|
||||
or using git. The root directory will be a **halide_root**.
|
||||
@code
|
||||
git clone https://github.com/halide/Halide.git
|
||||
@endcode
|
||||
|
||||
- Build Halide on Linux
|
||||
@code
|
||||
cd halide_root
|
||||
mkdir build && cd build
|
||||
cmake -DLLVM_DIR=llvm_root/build/lib/cmake/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF ..
|
||||
make -j4
|
||||
@endcode
|
||||
|
||||
- Build Halide on Windows (Developer Command Prompt)
|
||||
@code
|
||||
cd halide_root
|
||||
mkdir build && cd build
|
||||
cmake.exe -DLLVM_DIR=\\path-to-llvm-install\\lib\\cmake\\llvm -DLLVM_VERSION=40 -DWITH_TESTS=OFF -DWITH_APPS=OFF -DWITH_TUTORIALS=OFF -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 14 Win64" ..
|
||||
MSBuild.exe /m:4 /t:Build /p:Configuration=Release .\\ALL_BUILD.vcxproj
|
||||
@endcode
|
||||
|
||||
## Build OpenCV with Halide backend
|
||||
When you build OpenCV add the following configuration flags:
|
||||
|
||||
- `WITH_HALIDE` - enable Halide linkage
|
||||
|
||||
- `HALIDE_ROOT_DIR` - path to Halide build directory
|
||||
|
||||
## Set Halide as a preferable backend
|
||||
@code
|
||||
net.setPreferableBackend(DNN_BACKEND_HALIDE);
|
||||
@endcode
|
||||
@@ -0,0 +1,92 @@
|
||||
# How to schedule your network for Halide backend {#tutorial_dnn_halide_scheduling}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_halide}
|
||||
@next_tutorial{tutorial_dnn_android}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Dmitry Kurtaev |
|
||||
| Compatibility | OpenCV >= 3.3 |
|
||||
|
||||
## Introduction
|
||||
Halide code is the same for every device we use. But for achieving the satisfied
|
||||
efficiency we should schedule computations properly. In this tutorial we describe
|
||||
the ways to schedule your networks using Halide backend in OpenCV deep learning module.
|
||||
|
||||
For better understanding of Halide scheduling you might want to read tutorials @ http://halide-lang.org/tutorials.
|
||||
|
||||
If it's your first meeting with Halide in OpenCV, we recommend to start from @ref tutorial_dnn_halide.
|
||||
|
||||
## Configuration files
|
||||
You can schedule computations of Halide pipeline by writing textual configuration files.
|
||||
It means that you can easily vectorize, parallelize and manage loops order of
|
||||
layers computation. Pass path to file with scheduling directives for specific
|
||||
device into ```cv::dnn::Net::setHalideScheduler``` before the first ```cv::dnn::Net::forward``` call.
|
||||
|
||||
Scheduling configuration files represented as YAML files where each node is a
|
||||
scheduled function or a scheduling directive.
|
||||
@code
|
||||
relu1:
|
||||
reorder: [x, c, y]
|
||||
split: { y: 2, c: 8 }
|
||||
parallel: [yo, co]
|
||||
unroll: yi
|
||||
vectorize: { x: 4 }
|
||||
conv1_constant_exterior:
|
||||
compute_at: { relu1: yi }
|
||||
@endcode
|
||||
|
||||
Considered use variables `n` for batch dimension, `c` for channels,
|
||||
`y` for rows and `x` for columns. For variables after split are used names
|
||||
with the same prefix but `o` and `i` suffixes for outer and inner variables
|
||||
correspondingly. In example, for variable `x` in range `[0, 10)` directive
|
||||
`split: { x: 2 }` gives new ones `xo` in range `[0, 5)` and `xi` in range `[0, 2)`.
|
||||
Variable name `x` is no longer available in the same scheduling node.
|
||||
|
||||
You can find scheduling examples at [opencv_extra/testdata/dnn](https://github.com/opencv/opencv_extra/tree/master/testdata/dnn)
|
||||
and use it for schedule your networks.
|
||||
|
||||
## Layers fusing
|
||||
Thanks to layers fusing we can schedule only the top layers of fused sets.
|
||||
Because for every output value we use the fused formula.
|
||||
In example, if you have three layers Convolution + Scale + ReLU one by one,
|
||||
@code
|
||||
conv(x, y, c, n) = sum(...) + bias(c);
|
||||
scale(x, y, c, n) = conv(x, y, c, n) * weights(c);
|
||||
relu(x, y, c, n) = max(scale(x, y, c, n), 0);
|
||||
@endcode
|
||||
|
||||
fused function is something like
|
||||
@code
|
||||
relu(x, y, c, n) = max((sum(...) + bias(c)) * weights(c), 0);
|
||||
@endcode
|
||||
|
||||
So only function called `relu` require scheduling.
|
||||
|
||||
## Scheduling patterns
|
||||
Sometimes networks built using blocked structure that means some layer are
|
||||
identical or quite similar. If you want to apply the same scheduling for
|
||||
different layers accurate to tiling or vectorization factors, define scheduling
|
||||
patterns in section `patterns` at the beginning of scheduling file.
|
||||
Also, your patters may use some parametric variables.
|
||||
@code
|
||||
# At the beginning of the file
|
||||
patterns:
|
||||
fully_connected:
|
||||
split: { c: c_split }
|
||||
fuse: { src: [x, y, co], dst: block }
|
||||
parallel: block
|
||||
vectorize: { ci: c_split }
|
||||
# Somewhere below
|
||||
fc8:
|
||||
pattern: fully_connected
|
||||
params: { c_split: 8 }
|
||||
@endcode
|
||||
|
||||
## Automatic scheduling
|
||||
You can let DNN to schedule layers automatically. Just skip call of ```cv::dnn::Net::setHalideScheduler```. Sometimes it might be even more efficient than manual scheduling.
|
||||
But if specific layers require be scheduled manually, you would be able to
|
||||
mix both manual and automatic scheduling ways. Write scheduling file
|
||||
and skip layers that you want to be scheduled automatically.
|
||||
54
doc/tutorials/dnn/dnn_javascript/dnn_javascript.markdown
Normal file
@@ -0,0 +1,54 @@
|
||||
# How to run deep networks in browser {#tutorial_dnn_javascript}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_yolo}
|
||||
@next_tutorial{tutorial_dnn_custom_layers}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Dmitry Kurtaev |
|
||||
| Compatibility | OpenCV >= 3.3.1 |
|
||||
|
||||
## Introduction
|
||||
This tutorial will show us how to run deep learning models using OpenCV.js right
|
||||
in a browser. Tutorial refers a sample of face detection and face recognition
|
||||
models pipeline.
|
||||
|
||||
## Face detection
|
||||
Face detection network gets BGR image as input and produces set of bounding boxes
|
||||
that might contain faces. All that we need is just select the boxes with a strong
|
||||
confidence.
|
||||
|
||||
## Face recognition
|
||||
Network is called OpenFace (project https://github.com/cmusatyalab/openface).
|
||||
Face recognition model receives RGB face image of size `96x96`. Then it returns
|
||||
`128`-dimensional unit vector that represents input face as a point on the unit
|
||||
multidimensional sphere. So difference between two faces is an angle between two
|
||||
output vectors.
|
||||
|
||||
## Sample
|
||||
All the sample is an HTML page that has JavaScript code to use OpenCV.js functionality.
|
||||
You may see an insertion of this page below. Press `Start` button to begin a demo.
|
||||
Press `Add a person` to name a person that is recognized as an unknown one.
|
||||
Next we'll discuss main parts of the code.
|
||||
|
||||
@htmlinclude js_face_recognition.html
|
||||
|
||||
-# Run face detection network to detect faces on input image.
|
||||
@snippet dnn/js_face_recognition.html Run face detection model
|
||||
You may play with input blob sizes to balance detection quality and efficiency.
|
||||
The bigger input blob the smaller faces may be detected.
|
||||
|
||||
-# Run face recognition network to receive `128`-dimensional unit feature vector by input face image.
|
||||
@snippet dnn/js_face_recognition.html Get 128 floating points feature vector
|
||||
|
||||
-# Perform a recognition.
|
||||
@snippet dnn/js_face_recognition.html Recognize
|
||||
Match a new feature vector with registered ones. Return a name of the best matched person.
|
||||
|
||||
-# The main loop.
|
||||
@snippet dnn/js_face_recognition.html Define frames processing
|
||||
A main loop of our application receives a frames from a camera and makes a recognition
|
||||
of an every detected face on the frame. We start this function ones when OpenCV.js was
|
||||
initialized and deep learning models were downloaded.
|
||||
|
After Width: | Height: | Size: 59 KiB |
|
After Width: | Height: | Size: 29 KiB |
|
After Width: | Height: | Size: 61 KiB |
|
After Width: | Height: | Size: 61 KiB |
@@ -0,0 +1,220 @@
|
||||
# Conversion of PyTorch Classification Models and Launch with OpenCV C++ {#pytorch_cls_c_tutorial_dnn_conversion}
|
||||
|
||||
@prev_tutorial{pytorch_cls_tutorial_dnn_conversion}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Anastasia Murzova |
|
||||
| Compatibility | OpenCV >= 4.5 |
|
||||
|
||||
## Goals
|
||||
In this tutorial you will learn how to:
|
||||
* convert PyTorch classification models into ONNX format
|
||||
* run converted PyTorch model with OpenCV C/C++ API
|
||||
* provide model inference
|
||||
|
||||
We will explore the above-listed points by the example of ResNet-50 architecture.
|
||||
|
||||
## Introduction
|
||||
Let's briefly view the key concepts involved in the pipeline of PyTorch models transition with OpenCV API. The initial step in conversion of PyTorch models into cv::dnn::Net
|
||||
is model transferring into [ONNX](https://onnx.ai/about.html) format. ONNX aims at the interchangeability of the neural networks between various frameworks. There is a built-in function in PyTorch for ONNX conversion: [``torch.onnx.export``](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export).
|
||||
Further the obtained ``.onnx`` model is passed into cv::dnn::readNetFromONNX or cv::dnn::readNet.
|
||||
|
||||
## Requirements
|
||||
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
|
||||
|
||||
```console
|
||||
virtualenv -p /usr/bin/python3.7 <env_dir_path>
|
||||
source <env_dir_path>/bin/activate
|
||||
```
|
||||
|
||||
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
|
||||
|
||||
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
|
||||
The below line initiates requirements installation into the previously activated virtual environment:
|
||||
|
||||
```console
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Practice
|
||||
In this part we are going to cover the following points:
|
||||
1. create a classification model conversion pipeline
|
||||
2. provide the inference, process prediction results
|
||||
|
||||
### Model Conversion Pipeline
|
||||
The code in this subchapter is located in the ``samples/dnn/dnn_model_runner`` module and can be executed with the line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_resnet50_onnx
|
||||
```
|
||||
|
||||
The following code contains the description of the below-listed steps:
|
||||
1. instantiate PyTorch model
|
||||
2. convert PyTorch model into ``.onnx``
|
||||
|
||||
```python
|
||||
# initialize PyTorch ResNet-50 model
|
||||
original_model = models.resnet50(pretrained=True)
|
||||
|
||||
# get the path to the converted into ONNX PyTorch model
|
||||
full_model_path = get_pytorch_onnx_model(original_model)
|
||||
print("PyTorch ResNet-50 model was successfully converted: ", full_model_path)
|
||||
```
|
||||
|
||||
``get_pytorch_onnx_model(original_model)`` function is based on ``torch.onnx.export(...)`` call:
|
||||
|
||||
```python
|
||||
# define the directory for further converted model save
|
||||
onnx_model_path = "models"
|
||||
# define the name of further converted model
|
||||
onnx_model_name = "resnet50.onnx"
|
||||
|
||||
# create directory for further converted model
|
||||
os.makedirs(onnx_model_path, exist_ok=True)
|
||||
|
||||
# get full path to the converted model
|
||||
full_model_path = os.path.join(onnx_model_path, onnx_model_name)
|
||||
|
||||
# generate model input
|
||||
generated_input = Variable(
|
||||
torch.randn(1, 3, 224, 224)
|
||||
)
|
||||
|
||||
# model export into ONNX format
|
||||
torch.onnx.export(
|
||||
original_model,
|
||||
generated_input,
|
||||
full_model_path,
|
||||
verbose=True,
|
||||
input_names=["input"],
|
||||
output_names=["output"],
|
||||
opset_version=11
|
||||
)
|
||||
```
|
||||
|
||||
After the successful execution of the above code we will get the following output:
|
||||
|
||||
```console
|
||||
PyTorch ResNet-50 model was successfully converted: models/resnet50.onnx
|
||||
```
|
||||
|
||||
The proposed in ``dnn/samples`` module ``dnn_model_runner`` allows us to reproduce the above conversion steps for the following PyTorch classification models:
|
||||
* alexnet
|
||||
* vgg11
|
||||
* vgg13
|
||||
* vgg16
|
||||
* vgg19
|
||||
* resnet18
|
||||
* resnet34
|
||||
* resnet50
|
||||
* resnet101
|
||||
* resnet152
|
||||
* squeezenet1_0
|
||||
* squeezenet1_1
|
||||
* resnext50_32x4d
|
||||
* resnext101_32x8d
|
||||
* wide_resnet50_2
|
||||
* wide_resnet101_2
|
||||
|
||||
To obtain the converted model, the following line should be executed:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name> --evaluate False
|
||||
```
|
||||
|
||||
For the ResNet-50 case the below line should be run:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50 --evaluate False
|
||||
```
|
||||
|
||||
The default root directory for the converted model storage is defined in module ``CommonConfig``:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CommonConfig:
|
||||
output_data_root_dir: str = "dnn_model_runner/dnn_conversion"
|
||||
```
|
||||
|
||||
Thus, the converted ResNet-50 will be saved in ``dnn_model_runner/dnn_conversion/models``.
|
||||
|
||||
### Inference Pipeline
|
||||
Now we can use ```models/resnet50.onnx``` for the inference pipeline using OpenCV C/C++ API. The implemented pipeline can be found in [samples/dnn/classification.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.cpp).
|
||||
After the build of samples (``BUILD_EXAMPLES`` flag value should be ``ON``), the appropriate ``example_dnn_classification`` executable file will be provided.
|
||||
|
||||
To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:
|
||||
```console
|
||||
fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
```
|
||||
|
||||

|
||||
|
||||
For the label decoding of the obtained prediction, we also need ``imagenet_classes.txt`` file, which contains the full list of the ImageNet classes.
|
||||
|
||||
In this tutorial we will run the inference process for the converted PyTorch ResNet-50 model from the build (``samples/build``) directory:
|
||||
|
||||
```
|
||||
./dnn/example_dnn_classification --model=../dnn/models/resnet50.onnx --input=../data/squirrel_cls.jpg --width=224 --height=224 --rgb=true --scale="0.003921569" --mean="123.675 116.28 103.53" --std="0.229 0.224 0.225" --crop=true --initial_width=256 --initial_height=256 --classes=../data/dnn/classification_classes_ILSVRC2012.txt
|
||||
```
|
||||
|
||||
Let's explore ``classification.cpp`` key points step by step:
|
||||
|
||||
1. read the model with cv::dnn::readNet, initialize the network:
|
||||
|
||||
```cpp
|
||||
Net net = readNet(model, config, framework);
|
||||
```
|
||||
|
||||
The ``model`` parameter value is taken from ``--model`` key. In our case, it is ``resnet50.onnx``.
|
||||
|
||||
* preprocess input image:
|
||||
|
||||
```cpp
|
||||
if (rszWidth != 0 && rszHeight != 0)
|
||||
{
|
||||
resize(frame, frame, Size(rszWidth, rszHeight));
|
||||
}
|
||||
|
||||
// Create a 4D blob from a frame
|
||||
blobFromImage(frame, blob, scale, Size(inpWidth, inpHeight), mean, swapRB, crop);
|
||||
|
||||
// Check std values.
|
||||
if (std.val[0] != 0.0 && std.val[1] != 0.0 && std.val[2] != 0.0)
|
||||
{
|
||||
// Divide blob by std.
|
||||
divide(blob, std, blob);
|
||||
}
|
||||
```
|
||||
|
||||
In this step we use cv::dnn::blobFromImage function to prepare model input.
|
||||
We set ``Size(rszWidth, rszHeight)`` with ``--initial_width=256 --initial_height=256`` for the initial image resize as it's described in [PyTorch ResNet inference pipeline](https://pytorch.org/hub/pytorch_vision_resnet/).
|
||||
|
||||
It should be noted that firstly in cv::dnn::blobFromImage mean value is subtracted and only then pixel values are multiplied by scale.
|
||||
Thus, we use ``--mean="123.675 116.28 103.53"``, which is equivalent to ``[0.485, 0.456, 0.406]`` multiplied by ``255.0`` to reproduce the original image preprocessing order for PyTorch classification models:
|
||||
|
||||
```python
|
||||
img /= 255.0
|
||||
img -= [0.485, 0.456, 0.406]
|
||||
img /= [0.229, 0.224, 0.225]
|
||||
```
|
||||
|
||||
* make forward pass:
|
||||
|
||||
```cpp
|
||||
net.setInput(blob);
|
||||
Mat prob = net.forward();
|
||||
```
|
||||
|
||||
* process the prediction:
|
||||
|
||||
```cpp
|
||||
Point classIdPoint;
|
||||
double confidence;
|
||||
minMaxLoc(prob.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
|
||||
int classId = classIdPoint.x;
|
||||
```
|
||||
|
||||
Here we choose the most likely object class. The ``classId`` result for our case is 335 - fox squirrel, eastern fox squirrel, Sciurus niger:
|
||||
|
||||

|
||||
@@ -0,0 +1,362 @@
|
||||
# Conversion of PyTorch Classification Models and Launch with OpenCV Python {#pytorch_cls_tutorial_dnn_conversion}
|
||||
|
||||
@prev_tutorial{tutorial_dnn_OCR}
|
||||
@next_tutorial{pytorch_cls_c_tutorial_dnn_conversion}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Anastasia Murzova |
|
||||
| Compatibility | OpenCV >= 4.5 |
|
||||
|
||||
## Goals
|
||||
In this tutorial you will learn how to:
|
||||
* convert PyTorch classification models into ONNX format
|
||||
* run converted PyTorch model with OpenCV Python API
|
||||
* obtain an evaluation of the PyTorch and OpenCV DNN models.
|
||||
|
||||
We will explore the above-listed points by the example of the ResNet-50 architecture.
|
||||
|
||||
## Introduction
|
||||
Let's briefly view the key concepts involved in the pipeline of PyTorch models transition with OpenCV API. The initial step in conversion of PyTorch models into cv.dnn.Net
|
||||
is model transferring into [ONNX](https://onnx.ai/about.html) format. ONNX aims at the interchangeability of the neural networks between various frameworks. There is a built-in function in PyTorch for ONNX conversion: [``torch.onnx.export``](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export).
|
||||
Further the obtained ``.onnx`` model is passed into cv.dnn.readNetFromONNX.
|
||||
|
||||
## Requirements
|
||||
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
|
||||
|
||||
```console
|
||||
virtualenv -p /usr/bin/python3.7 <env_dir_path>
|
||||
source <env_dir_path>/bin/activate
|
||||
```
|
||||
|
||||
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
|
||||
|
||||
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
|
||||
The below line initiates requirements installation into the previously activated virtual environment:
|
||||
|
||||
```console
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Practice
|
||||
In this part we are going to cover the following points:
|
||||
1. create a classification model conversion pipeline and provide the inference
|
||||
2. evaluate and test classification models
|
||||
|
||||
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" part can be skipped.
|
||||
|
||||
### Model Conversion Pipeline
|
||||
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_resnet50
|
||||
```
|
||||
|
||||
The following code contains the description of the below-listed steps:
|
||||
1. instantiate PyTorch model
|
||||
2. convert PyTorch model into ``.onnx``
|
||||
3. read the transferred network with OpenCV API
|
||||
4. prepare input data
|
||||
5. provide inference
|
||||
|
||||
```python
|
||||
# initialize PyTorch ResNet-50 model
|
||||
original_model = models.resnet50(pretrained=True)
|
||||
|
||||
# get the path to the converted into ONNX PyTorch model
|
||||
full_model_path = get_pytorch_onnx_model(original_model)
|
||||
|
||||
# read converted .onnx model with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
|
||||
print("OpenCV model was successfully read. Layer IDs: \n", opencv_net.getLayerNames())
|
||||
|
||||
# get preprocessed image
|
||||
input_img = get_preprocessed_img("../data/squirrel_cls.jpg")
|
||||
|
||||
# get ImageNet labels
|
||||
imagenet_labels = get_imagenet_labels("../data/dnn/classification_classes_ILSVRC2012.txt")
|
||||
|
||||
# obtain OpenCV DNN predictions
|
||||
get_opencv_dnn_prediction(opencv_net, input_img, imagenet_labels)
|
||||
|
||||
# obtain original PyTorch ResNet50 predictions
|
||||
get_pytorch_dnn_prediction(original_model, input_img, imagenet_labels)
|
||||
```
|
||||
|
||||
To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:
|
||||
```console
|
||||
fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
```
|
||||
|
||||

|
||||
|
||||
For the label decoding of the obtained prediction, we also need ``imagenet_classes.txt`` file, which contains the full list of the ImageNet classes.
|
||||
|
||||
Let's go deeper into each step by the example of pretrained PyTorch ResNet-50:
|
||||
* instantiate PyTorch ResNet-50 model:
|
||||
|
||||
```python
|
||||
# initialize PyTorch ResNet-50 model
|
||||
original_model = models.resnet50(pretrained=True)
|
||||
```
|
||||
|
||||
* convert PyTorch model into ONNX:
|
||||
|
||||
```python
|
||||
# define the directory for further converted model save
|
||||
onnx_model_path = "models"
|
||||
# define the name of further converted model
|
||||
onnx_model_name = "resnet50.onnx"
|
||||
|
||||
# create directory for further converted model
|
||||
os.makedirs(onnx_model_path, exist_ok=True)
|
||||
|
||||
# get full path to the converted model
|
||||
full_model_path = os.path.join(onnx_model_path, onnx_model_name)
|
||||
|
||||
# generate model input
|
||||
generated_input = Variable(
|
||||
torch.randn(1, 3, 224, 224)
|
||||
)
|
||||
|
||||
# model export into ONNX format
|
||||
torch.onnx.export(
|
||||
original_model,
|
||||
generated_input,
|
||||
full_model_path,
|
||||
verbose=True,
|
||||
input_names=["input"],
|
||||
output_names=["output"],
|
||||
opset_version=11
|
||||
)
|
||||
```
|
||||
|
||||
After the successful execution of the above code, we will get ``models/resnet50.onnx``.
|
||||
|
||||
* read the transferred network with cv.dnn.readNetFromONNX passing the obtained in the previous step ONNX model into it:
|
||||
|
||||
```python
|
||||
# read converted .onnx model with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
|
||||
```
|
||||
|
||||
* prepare input data:
|
||||
|
||||
```python
|
||||
# read the image
|
||||
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
|
||||
input_img = input_img.astype(np.float32)
|
||||
|
||||
input_img = cv2.resize(input_img, (256, 256))
|
||||
|
||||
# define preprocess parameters
|
||||
mean = np.array([0.485, 0.456, 0.406]) * 255.0
|
||||
scale = 1 / 255.0
|
||||
std = [0.229, 0.224, 0.225]
|
||||
|
||||
# prepare input blob to fit the model input:
|
||||
# 1. subtract mean
|
||||
# 2. scale to set pixel values from 0 to 1
|
||||
input_blob = cv2.dnn.blobFromImage(
|
||||
image=input_img,
|
||||
scalefactor=scale,
|
||||
size=(224, 224), # img target size
|
||||
mean=mean,
|
||||
swapRB=True, # BGR -> RGB
|
||||
crop=True # center crop
|
||||
)
|
||||
# 3. divide by std
|
||||
input_blob[0] /= np.asarray(std, dtype=np.float32).reshape(3, 1, 1)
|
||||
```
|
||||
|
||||
In this step we read the image and prepare model input with cv.dnn.blobFromImage function, which returns 4-dimensional blob.
|
||||
It should be noted that firstly in cv.dnn.blobFromImage mean value is subtracted and only then pixel values are multiplied by scale. Thus, ``mean`` is multiplied by ``255.0`` to reproduce the original image preprocessing order:
|
||||
|
||||
```python
|
||||
img /= 255.0
|
||||
img -= [0.485, 0.456, 0.406]
|
||||
img /= [0.229, 0.224, 0.225]
|
||||
```
|
||||
|
||||
* OpenCV cv.dnn.Net inference:
|
||||
|
||||
```python
|
||||
# set OpenCV DNN input
|
||||
opencv_net.setInput(preproc_img)
|
||||
|
||||
# OpenCV DNN inference
|
||||
out = opencv_net.forward()
|
||||
print("OpenCV DNN prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get the predicted class ID
|
||||
imagenet_class_id = np.argmax(out)
|
||||
|
||||
# get confidence
|
||||
confidence = out[0][imagenet_class_id]
|
||||
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
|
||||
print("* confidence: {:.4f}".format(confidence))
|
||||
```
|
||||
|
||||
After the above code execution we will get the following output:
|
||||
|
||||
```console
|
||||
OpenCV DNN prediction:
|
||||
* shape: (1, 1000)
|
||||
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
* confidence: 14.8308
|
||||
```
|
||||
|
||||
* PyTorch ResNet-50 model inference:
|
||||
|
||||
```python
|
||||
original_net.eval()
|
||||
preproc_img = torch.FloatTensor(preproc_img)
|
||||
|
||||
# inference
|
||||
out = original_net(preproc_img)
|
||||
print("\nPyTorch model prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get the predicted class ID
|
||||
imagenet_class_id = torch.argmax(out, axis=1).item()
|
||||
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
|
||||
|
||||
# get confidence
|
||||
confidence = out[0][imagenet_class_id]
|
||||
print("* confidence: {:.4f}".format(confidence.item()))
|
||||
```
|
||||
|
||||
After the above code launching we will get the following output:
|
||||
|
||||
```console
|
||||
PyTorch model prediction:
|
||||
* shape: torch.Size([1, 1000])
|
||||
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
* confidence: 14.8308
|
||||
```
|
||||
|
||||
The inference results of the original ResNet-50 model and cv.dnn.Net are equal. For the extended evaluation of the models we can use ``py_to_py_cls`` of the ``dnn_model_runner`` module. This module part will be described in the next subchapter.
|
||||
|
||||
### Evaluation of the Models
|
||||
|
||||
The proposed in ``samples/dnn`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the ImageNet dataset and test execution for the following PyTorch classification models:
|
||||
* alexnet
|
||||
* vgg11
|
||||
* vgg13
|
||||
* vgg16
|
||||
* vgg19
|
||||
* resnet18
|
||||
* resnet34
|
||||
* resnet50
|
||||
* resnet101
|
||||
* resnet152
|
||||
* squeezenet1_0
|
||||
* squeezenet1_1
|
||||
* resnext50_32x4d
|
||||
* resnext101_32x8d
|
||||
* wide_resnet50_2
|
||||
* wide_resnet101_2
|
||||
|
||||
This list can be also extended with further appropriate evaluation pipeline configuration.
|
||||
|
||||
#### Evaluation Mode
|
||||
|
||||
The below line represents running of the module in the evaluation mode:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name>
|
||||
```
|
||||
|
||||
Chosen from the list classification model will be read into OpenCV cv.dnn.Net object. Evaluation results of PyTorch and OpenCV models (accuracy, inference time, L1) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
|
||||
|
||||
Necessary evaluation configurations are defined in the [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) and can be modified in accordance with actual paths of data location:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestClsConfig:
|
||||
batch_size: int = 50
|
||||
frame_size: int = 224
|
||||
img_root_dir: str = "./ILSVRC2012_img_val"
|
||||
# location of image-class matching
|
||||
img_cls_file: str = "./val.txt"
|
||||
bgr_to_rgb: bool = True
|
||||
```
|
||||
|
||||
To initiate the evaluation of the PyTorch ResNet-50, run the following line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50
|
||||
```
|
||||
|
||||
After script launch, the log file with evaluation data will be generated in ``dnn_model_runner/dnn_conversion/logs``:
|
||||
|
||||
```console
|
||||
The model PyTorch resnet50 was successfully obtained and converted to OpenCV DNN resnet50
|
||||
===== Running evaluation of the model with the following params:
|
||||
* val data location: ./ILSVRC2012_img_val
|
||||
* log file location: dnn_model_runner/dnn_conversion/logs/PyTorch_resnet50_log.txt
|
||||
```
|
||||
|
||||
#### Test Mode
|
||||
|
||||
The below line represents running of the module in the test mode, namely it provides the steps for the model inference:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name> --test True --default_img_preprocess <True/False> --evaluate False
|
||||
```
|
||||
|
||||
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
|
||||
|
||||
Test configuration is represented in [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestClsModuleConfig`` class:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestClsModuleConfig:
|
||||
cls_test_data_dir: str = "../data"
|
||||
test_module_name: str = "classification"
|
||||
test_module_path: str = "classification.py"
|
||||
input_img: str = os.path.join(cls_test_data_dir, "squirrel_cls.jpg")
|
||||
model: str = ""
|
||||
|
||||
frame_height: str = str(TestClsConfig.frame_size)
|
||||
frame_width: str = str(TestClsConfig.frame_size)
|
||||
scale: str = "1.0"
|
||||
mean: List[str] = field(default_factory=lambda: ["0.0", "0.0", "0.0"])
|
||||
std: List[str] = field(default_factory=list)
|
||||
crop: str = "False"
|
||||
rgb: str = "True"
|
||||
rsz_height: str = ""
|
||||
rsz_width: str = ""
|
||||
classes: str = os.path.join(cls_test_data_dir, "dnn", "classification_classes_ILSVRC2012.txt")
|
||||
```
|
||||
|
||||
The default image preprocessing options are defined in [default_preprocess_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/default_preprocess_config.py). For instance:
|
||||
|
||||
```python
|
||||
BASE_IMG_SCALE_FACTOR = 1 / 255.0
|
||||
PYTORCH_RSZ_HEIGHT = 256
|
||||
PYTORCH_RSZ_WIDTH = 256
|
||||
|
||||
pytorch_resize_input_blob = {
|
||||
"mean": ["123.675", "116.28", "103.53"],
|
||||
"scale": str(BASE_IMG_SCALE_FACTOR),
|
||||
"std": ["0.229", "0.224", "0.225"],
|
||||
"crop": "True",
|
||||
"rgb": "True",
|
||||
"rsz_height": str(PYTORCH_RSZ_HEIGHT),
|
||||
"rsz_width": str(PYTORCH_RSZ_WIDTH)
|
||||
}
|
||||
```
|
||||
|
||||
The basis of the model testing is represented in [samples/dnn/classification.py](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.py). ``classification.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for cv.dnn.blobFromImage.
|
||||
|
||||
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50 --test True --default_img_preprocess True --evaluate False
|
||||
```
|
||||
|
||||
The network prediction is depicted in the top left corner of the output window:
|
||||
|
||||

|
||||
@@ -0,0 +1,360 @@
|
||||
# Conversion of TensorFlow Classification Models and Launch with OpenCV Python {#tf_cls_tutorial_dnn_conversion}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Anastasia Murzova |
|
||||
| Compatibility | OpenCV >= 4.5 |
|
||||
|
||||
## Goals
|
||||
In this tutorial you will learn how to:
|
||||
* obtain frozen graphs of TensorFlow (TF) classification models
|
||||
* run converted TensorFlow model with OpenCV Python API
|
||||
* obtain an evaluation of the TensorFlow and OpenCV DNN models
|
||||
|
||||
We will explore the above-listed points by the example of MobileNet architecture.
|
||||
|
||||
## Introduction
|
||||
Let's briefly view the key concepts involved in the pipeline of TensorFlow models transition with OpenCV API. The initial step in conversion of TensorFlow models into cv.dnn.Net
|
||||
is obtaining the frozen TF model graph. Frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. Usually the frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (```.pb```) files.
|
||||
After the model ``.pb`` file was generated it can be read with cv.dnn.readNetFromTensorflow function.
|
||||
|
||||
## Requirements
|
||||
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
|
||||
|
||||
```console
|
||||
virtualenv -p /usr/bin/python3.7 <env_dir_path>
|
||||
source <env_dir_path>/bin/activate
|
||||
```
|
||||
|
||||
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
|
||||
|
||||
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
|
||||
The below line initiates requirements installation into the previously activated virtual environment:
|
||||
|
||||
```console
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Practice
|
||||
In this part we are going to cover the following points:
|
||||
1. create a TF classification model conversion pipeline and provide the inference
|
||||
2. evaluate and test TF classification models
|
||||
|
||||
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" tutorial part can be skipped.
|
||||
|
||||
### Model Conversion Pipeline
|
||||
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_mobilenet
|
||||
```
|
||||
|
||||
The following code contains the description of the below-listed steps:
|
||||
1. instantiate TF model
|
||||
2. create TF frozen graph
|
||||
3. read TF frozen graph with OpenCV API
|
||||
4. prepare input data
|
||||
5. provide inference
|
||||
|
||||
```python
|
||||
# initialize TF MobileNet model
|
||||
original_tf_model = MobileNet(
|
||||
include_top=True,
|
||||
weights="imagenet"
|
||||
)
|
||||
|
||||
# get TF frozen graph path
|
||||
full_pb_path = get_tf_model_proto(original_tf_model)
|
||||
|
||||
# read frozen graph with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromTensorflow(full_pb_path)
|
||||
print("OpenCV model was successfully read. Model layers: \n", opencv_net.getLayerNames())
|
||||
|
||||
# get preprocessed image
|
||||
input_img = get_preprocessed_img("../data/squirrel_cls.jpg")
|
||||
|
||||
# get ImageNet labels
|
||||
imagenet_labels = get_imagenet_labels("../data/dnn/classification_classes_ILSVRC2012.txt")
|
||||
|
||||
# obtain OpenCV DNN predictions
|
||||
get_opencv_dnn_prediction(opencv_net, input_img, imagenet_labels)
|
||||
|
||||
# obtain TF model predictions
|
||||
get_tf_dnn_prediction(original_tf_model, input_img, imagenet_labels)
|
||||
```
|
||||
|
||||
To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:
|
||||
```console
|
||||
fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
```
|
||||
|
||||

|
||||
|
||||
For the label decoding of the obtained prediction, we also need ``imagenet_classes.txt`` file, which contains the full list of the ImageNet classes.
|
||||
|
||||
Let's go deeper into each step by the example of pretrained TF MobileNet:
|
||||
* instantiate TF model:
|
||||
|
||||
```python
|
||||
# initialize TF MobileNet model
|
||||
original_tf_model = MobileNet(
|
||||
include_top=True,
|
||||
weights="imagenet"
|
||||
)
|
||||
```
|
||||
|
||||
* create TF frozen graph
|
||||
|
||||
```python
|
||||
# define the directory for .pb model
|
||||
pb_model_path = "models"
|
||||
|
||||
# define the name of .pb model
|
||||
pb_model_name = "mobilenet.pb"
|
||||
|
||||
# create directory for further converted model
|
||||
os.makedirs(pb_model_path, exist_ok=True)
|
||||
|
||||
# get model TF graph
|
||||
tf_model_graph = tf.function(lambda x: tf_model(x))
|
||||
|
||||
# get concrete function
|
||||
tf_model_graph = tf_model_graph.get_concrete_function(
|
||||
tf.TensorSpec(tf_model.inputs[0].shape, tf_model.inputs[0].dtype))
|
||||
|
||||
# obtain frozen concrete function
|
||||
frozen_tf_func = convert_variables_to_constants_v2(tf_model_graph)
|
||||
# get frozen graph
|
||||
frozen_tf_func.graph.as_graph_def()
|
||||
|
||||
# save full tf model
|
||||
tf.io.write_graph(graph_or_graph_def=frozen_tf_func.graph,
|
||||
logdir=pb_model_path,
|
||||
name=pb_model_name,
|
||||
as_text=False)
|
||||
```
|
||||
|
||||
After the successful execution of the above code, we will get a frozen graph in ``models/mobilenet.pb``.
|
||||
|
||||
* read TF frozen graph with with cv.dnn.readNetFromTensorflow passing the obtained in the previous step ``mobilenet.pb`` into it:
|
||||
|
||||
```python
|
||||
# get TF frozen graph path
|
||||
full_pb_path = get_tf_model_proto(original_tf_model)
|
||||
```
|
||||
|
||||
* prepare input data with cv2.dnn.blobFromImage function:
|
||||
|
||||
```python
|
||||
# read the image
|
||||
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
|
||||
input_img = input_img.astype(np.float32)
|
||||
|
||||
# define preprocess parameters
|
||||
mean = np.array([1.0, 1.0, 1.0]) * 127.5
|
||||
scale = 1 / 127.5
|
||||
|
||||
# prepare input blob to fit the model input:
|
||||
# 1. subtract mean
|
||||
# 2. scale to set pixel values from 0 to 1
|
||||
input_blob = cv2.dnn.blobFromImage(
|
||||
image=input_img,
|
||||
scalefactor=scale,
|
||||
size=(224, 224), # img target size
|
||||
mean=mean,
|
||||
swapRB=True, # BGR -> RGB
|
||||
crop=True # center crop
|
||||
)
|
||||
print("Input blob shape: {}\n".format(input_blob.shape))
|
||||
```
|
||||
|
||||
Please, pay attention at the preprocessing order in the cv2.dnn.blobFromImage function. Firstly, the mean value is subtracted and only then pixel values are multiplied by the defined scale.
|
||||
Therefore, to reproduce the image preprocessing pipeline from the TF [``mobilenet.preprocess_input``](https://github.com/tensorflow/tensorflow/blob/02032fb477e9417197132648ec81e75beee9063a/tensorflow/python/keras/applications/mobilenet.py#L443-L445) function, we multiply ``mean`` by ``127.5``.
|
||||
|
||||
As a result, 4-dimensional ``input_blob`` was obtained:
|
||||
|
||||
``Input blob shape: (1, 3, 224, 224)``
|
||||
|
||||
* provide OpenCV cv.dnn.Net inference:
|
||||
|
||||
```python
|
||||
# set OpenCV DNN input
|
||||
opencv_net.setInput(preproc_img)
|
||||
|
||||
# OpenCV DNN inference
|
||||
out = opencv_net.forward()
|
||||
print("OpenCV DNN prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get the predicted class ID
|
||||
imagenet_class_id = np.argmax(out)
|
||||
|
||||
# get confidence
|
||||
confidence = out[0][imagenet_class_id]
|
||||
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
|
||||
print("* confidence: {:.4f}\n".format(confidence))
|
||||
```
|
||||
|
||||
After the above code execution we will get the following output:
|
||||
|
||||
```console
|
||||
OpenCV DNN prediction:
|
||||
* shape: (1, 1000)
|
||||
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
* confidence: 0.9525
|
||||
```
|
||||
|
||||
* provide TF MobileNet inference:
|
||||
|
||||
```python
|
||||
# inference
|
||||
preproc_img = preproc_img.transpose(0, 2, 3, 1)
|
||||
print("TF input blob shape: {}\n".format(preproc_img.shape))
|
||||
|
||||
out = original_net(preproc_img)
|
||||
|
||||
print("\nTensorFlow model prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get the predicted class ID
|
||||
imagenet_class_id = np.argmax(out)
|
||||
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
|
||||
|
||||
# get confidence
|
||||
confidence = out[0][imagenet_class_id]
|
||||
print("* confidence: {:.4f}".format(confidence))
|
||||
```
|
||||
|
||||
To fit TF model input, ``input_blob`` was transposed:
|
||||
|
||||
```console
|
||||
TF input blob shape: (1, 224, 224, 3)
|
||||
```
|
||||
|
||||
TF inference results are the following:
|
||||
|
||||
```console
|
||||
TensorFlow model prediction:
|
||||
* shape: (1, 1000)
|
||||
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
|
||||
* confidence: 0.9525
|
||||
```
|
||||
|
||||
As it can be seen from the experiments OpenCV and TF inference results are equal.
|
||||
|
||||
### Evaluation of the Models
|
||||
|
||||
The proposed in ``dnn/samples`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the ImageNet dataset and test execution for the following TensorFlow classification models:
|
||||
* vgg16
|
||||
* vgg19
|
||||
* resnet50
|
||||
* resnet101
|
||||
* resnet152
|
||||
* densenet121
|
||||
* densenet169
|
||||
* densenet201
|
||||
* inceptionresnetv2
|
||||
* inceptionv3
|
||||
* mobilenet
|
||||
* mobilenetv2
|
||||
* nasnetlarge
|
||||
* nasnetmobile
|
||||
* xception
|
||||
|
||||
This list can be also extended with further appropriate evaluation pipeline configuration.
|
||||
|
||||
#### Evaluation Mode
|
||||
|
||||
To below line represents running of the module in the evaluation mode:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name <tf_cls_model_name>
|
||||
```
|
||||
|
||||
Chosen from the list classification model will be read into OpenCV ``cv.dnn_Net`` object. Evaluation results of TF and OpenCV models (accuracy, inference time, L1) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
|
||||
|
||||
Necessary evaluation configurations are defined in the [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) and can be modified in accordance with actual paths of data location::
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestClsConfig:
|
||||
batch_size: int = 50
|
||||
frame_size: int = 224
|
||||
img_root_dir: str = "./ILSVRC2012_img_val"
|
||||
# location of image-class matching
|
||||
img_cls_file: str = "./val.txt"
|
||||
bgr_to_rgb: bool = True
|
||||
```
|
||||
|
||||
The values from ``TestClsConfig`` can be customized in accordance with chosen model.
|
||||
|
||||
To initiate the evaluation of the TensorFlow MobileNet, run the following line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name mobilenet
|
||||
```
|
||||
|
||||
After script launch, the log file with evaluation data will be generated in ``dnn_model_runner/dnn_conversion/logs``:
|
||||
|
||||
```console
|
||||
===== Running evaluation of the model with the following params:
|
||||
* val data location: ./ILSVRC2012_img_val
|
||||
* log file location: dnn_model_runner/dnn_conversion/logs/TF_mobilenet_log.txt
|
||||
```
|
||||
|
||||
#### Test Mode
|
||||
|
||||
The below line represents running of the module in the test mode, namely it provides the steps for the model inference:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name <tf_cls_model_name> --test True --default_img_preprocess <True/False> --evaluate False
|
||||
```
|
||||
|
||||
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
|
||||
|
||||
Test configuration is represented in [test_config.py](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestClsModuleConfig`` class:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestClsModuleConfig:
|
||||
cls_test_data_dir: str = "../data"
|
||||
test_module_name: str = "classification"
|
||||
test_module_path: str = "classification.py"
|
||||
input_img: str = os.path.join(cls_test_data_dir, "squirrel_cls.jpg")
|
||||
model: str = ""
|
||||
|
||||
frame_height: str = str(TestClsConfig.frame_size)
|
||||
frame_width: str = str(TestClsConfig.frame_size)
|
||||
scale: str = "1.0"
|
||||
mean: List[str] = field(default_factory=lambda: ["0.0", "0.0", "0.0"])
|
||||
std: List[str] = field(default_factory=list)
|
||||
crop: str = "False"
|
||||
rgb: str = "True"
|
||||
rsz_height: str = ""
|
||||
rsz_width: str = ""
|
||||
classes: str = os.path.join(cls_test_data_dir, "dnn", "classification_classes_ILSVRC2012.txt")
|
||||
```
|
||||
|
||||
The default image preprocessing options are defined in ``default_preprocess_config.py``. For instance, for MobileNet:
|
||||
|
||||
```python
|
||||
tf_input_blob = {
|
||||
"mean": ["127.5", "127.5", "127.5"],
|
||||
"scale": str(1 / 127.5),
|
||||
"std": [],
|
||||
"crop": "True",
|
||||
"rgb": "True"
|
||||
}
|
||||
```
|
||||
|
||||
The basis of the model testing is represented in [samples/dnn/classification.py](https://github.com/opencv/opencv/blob/master/samples/dnn/classification.py). ``classification.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for cv.dnn.blobFromImage.
|
||||
|
||||
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name mobilenet --test True --default_img_preprocess True --evaluate False
|
||||
```
|
||||
|
||||
The network prediction is depicted in the top left corner of the output window:
|
||||
|
||||

|
||||
|
After Width: | Height: | Size: 81 KiB |
|
After Width: | Height: | Size: 74 KiB |
@@ -0,0 +1,140 @@
|
||||
# Conversion of TensorFlow Detection Models and Launch with OpenCV Python {#tf_det_tutorial_dnn_conversion}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Anastasia Murzova |
|
||||
| Compatibility | OpenCV >= 4.5 |
|
||||
|
||||
## Goals
|
||||
In this tutorial you will learn how to:
|
||||
* obtain frozen graphs of TensorFlow (TF) detection models
|
||||
* run converted TensorFlow model with OpenCV Python API
|
||||
|
||||
We will explore the above-listed points by the example of SSD MobileNetV1.
|
||||
|
||||
## Introduction
|
||||
Let's briefly view the key concepts involved in the pipeline of TensorFlow models transition with OpenCV API. The initial step in the conversion of TensorFlow models into cv.dnn.Net
|
||||
is obtaining the frozen TF model graph. A frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. The frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (```.pb```) files.
|
||||
There are special functions for reading ``.pb`` graphs in OpenCV: cv.dnn.readNetFromTensorflow and cv.dnn.readNet.
|
||||
|
||||
## Requirements
|
||||
To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:
|
||||
|
||||
```console
|
||||
virtualenv -p /usr/bin/python3.7 <env_dir_path>
|
||||
source <env_dir_path>/bin/activate
|
||||
```
|
||||
|
||||
For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial_py_table_of_contents_setup.
|
||||
|
||||
Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, ``opencv-python``) some dependencies.
|
||||
The below line initiates requirements installation into the previously activated virtual environment:
|
||||
|
||||
```console
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Practice
|
||||
In this part we are going to cover the following points:
|
||||
1. create a TF classification model conversion pipeline and provide the inference
|
||||
2. provide the inference, process prediction results
|
||||
|
||||
### Model Preparation
|
||||
The code in this subchapter is located in the ``samples/dnn/dnn_model_runner`` module and can be executed with the below line:
|
||||
|
||||
```console
|
||||
python -m dnn_model_runner.dnn_conversion.tf.detection.py_to_py_ssd_mobilenet
|
||||
```
|
||||
|
||||
The following code contains the steps of the TF SSD MobileNetV1 model retrieval:
|
||||
|
||||
```python
|
||||
tf_model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
|
||||
graph_extraction_dir = "./"
|
||||
frozen_graph_path = extract_tf_frozen_graph(tf_model_name, graph_extraction_dir)
|
||||
print("Frozen graph path for {}: {}".format(tf_model_name, frozen_graph_path))
|
||||
```
|
||||
|
||||
In ``extract_tf_frozen_graph`` function we extract the provided in model archive ``frozen_inference_graph.pb`` for its further processing:
|
||||
|
||||
```python
|
||||
# define model archive name
|
||||
tf_model_tar = model_name + '.tar.gz'
|
||||
# define link to retrieve model archive
|
||||
model_link = DETECTION_MODELS_URL + tf_model_tar
|
||||
|
||||
tf_frozen_graph_name = 'frozen_inference_graph'
|
||||
|
||||
try:
|
||||
urllib.request.urlretrieve(model_link, tf_model_tar)
|
||||
except Exception:
|
||||
print("TF {} was not retrieved: {}".format(model_name, model_link))
|
||||
return
|
||||
|
||||
print("TF {} was retrieved.".format(model_name))
|
||||
|
||||
tf_model_tar = tarfile.open(tf_model_tar)
|
||||
frozen_graph_path = ""
|
||||
|
||||
for model_tar_elem in tf_model_tar.getmembers():
|
||||
if tf_frozen_graph_name in os.path.basename(model_tar_elem.name):
|
||||
tf_model_tar.extract(model_tar_elem, extracted_model_path)
|
||||
frozen_graph_path = os.path.join(extracted_model_path, model_tar_elem.name)
|
||||
break
|
||||
tf_model_tar.close()
|
||||
```
|
||||
|
||||
After the successful execution of the above code we will get the following output:
|
||||
|
||||
```console
|
||||
TF ssd_mobilenet_v1_coco_2017_11_17 was retrieved.
|
||||
Frozen graph path for ssd_mobilenet_v1_coco_2017_11_17: ./ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb
|
||||
```
|
||||
|
||||
To provide model inference we will use the below [double-decker bus photo](https://www.pexels.com/photo/bus-and-car-on-one-way-street-3626589/) (under [Pexels](https://www.pexels.com/license/) license):
|
||||
|
||||

|
||||
|
||||
To initiate the test process we need to provide an appropriate model configuration. We will use [``ssd_mobilenet_v1_coco.config``](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config) from [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection#tensorflow-object-detection-api).
|
||||
TensorFlow Object Detection API framework contains helpful mechanisms for object detection model manipulations.
|
||||
|
||||
We will use this configuration to provide a text graph representation. To generate ``.pbtxt`` we will use the corresponding [``samples/dnn/tf_text_graph_ssd.py``](https://github.com/opencv/opencv/blob/master/samples/dnn/tf_text_graph_ssd.py) script:
|
||||
|
||||
```console
|
||||
python tf_text_graph_ssd.py --input ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb --config ssd_mobilenet_v1_coco_2017_11_17/ssd_mobilenet_v1_coco.config --output ssd_mobilenet_v1_coco_2017_11_17.pbtxt
|
||||
```
|
||||
|
||||
After successful execution ``ssd_mobilenet_v1_coco_2017_11_17.pbtxt`` will be created.
|
||||
|
||||
Before we run ``object_detection.py``, let's have a look at the default values for the SSD MobileNetV1 test process configuration. They are located in [``models.yml``](https://github.com/opencv/opencv/blob/master/samples/dnn/models.yml):
|
||||
|
||||
```yml
|
||||
ssd_tf:
|
||||
model: "ssd_mobilenet_v1_coco_2017_11_17.pb"
|
||||
config: "ssd_mobilenet_v1_coco_2017_11_17.pbtxt"
|
||||
mean: [0, 0, 0]
|
||||
scale: 1.0
|
||||
width: 300
|
||||
height: 300
|
||||
rgb: true
|
||||
classes: "object_detection_classes_coco.txt"
|
||||
sample: "object_detection"
|
||||
```
|
||||
|
||||
To fetch these values we need to provide frozen graph ``ssd_mobilenet_v1_coco_2017_11_17.pb`` model and text graph ``ssd_mobilenet_v1_coco_2017_11_17.pbtxt``:
|
||||
|
||||
```console
|
||||
python object_detection.py ssd_tf --input ../data/pexels_double_decker_bus.jpg
|
||||
```
|
||||
|
||||
This line is equivalent to:
|
||||
|
||||
```console
|
||||
python object_detection.py --model ssd_mobilenet_v1_coco_2017_11_17.pb --config ssd_mobilenet_v1_coco_2017_11_17.pbtxt --input ../data/pexels_double_decker_bus.jpg --width 300 --height 300 --classes ../data/dnn/object_detection_classes_coco.txt
|
||||
```
|
||||
|
||||
The result is:
|
||||
|
||||

|
||||
|
||||
There are several helpful parameters, which can be also customized for result corrections: threshold (``--thr``) and non-maximum suppression (``--nms``) values.
|
||||
@@ -0,0 +1,332 @@
|
||||
# Conversion of PyTorch Segmentation Models and Launch with OpenCV {#pytorch_segm_tutorial_dnn_conversion}
|
||||
|
||||
## Goals
|
||||
In this tutorial you will learn how to:
|
||||
* convert PyTorch segmentation models
|
||||
* run converted PyTorch model with OpenCV
|
||||
* obtain an evaluation of the PyTorch and OpenCV DNN models
|
||||
|
||||
We will explore the above-listed points by the example of the FCN ResNet-50 architecture.
|
||||
|
||||
## Introduction
|
||||
The key points involved in the transition pipeline of the [PyTorch classification](https://link_to_cls_tutorial) and segmentation models with OpenCV API are equal. The first step is model transferring into [ONNX](https://onnx.ai/about.html) format with PyTorch [``torch.onnx.export``](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export) built-in function.
|
||||
Further the obtained ``.onnx`` model is passed into cv.dnn.readNetFromONNX, which returns cv.dnn.Net object ready for DNN manipulations.
|
||||
|
||||
## Practice
|
||||
In this part we are going to cover the following points:
|
||||
1. create a segmentation model conversion pipeline and provide the inference
|
||||
2. evaluate and test segmentation models
|
||||
|
||||
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" part can be skipped.
|
||||
|
||||
### Model Conversion Pipeline
|
||||
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
|
||||
|
||||
``
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_fcnresnet50
|
||||
``
|
||||
|
||||
The following code contains the description of the below-listed steps:
|
||||
1. instantiate PyTorch model
|
||||
2. convert PyTorch model into ``.onnx``
|
||||
3. read the transferred network with OpenCV API
|
||||
4. prepare input data
|
||||
5. provide inference
|
||||
6. get colored masks from predictions
|
||||
7. visualize results
|
||||
|
||||
```python
|
||||
# initialize PyTorch FCN ResNet-50 model
|
||||
original_model = models.segmentation.fcn_resnet50(pretrained=True)
|
||||
|
||||
# get the path to the converted into ONNX PyTorch model
|
||||
full_model_path = get_pytorch_onnx_model(original_model)
|
||||
|
||||
# read converted .onnx model with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
|
||||
print("OpenCV model was successfully read. Layer IDs: \n", opencv_net.getLayerNames())
|
||||
|
||||
# get preprocessed image
|
||||
img, input_img = get_processed_imgs("test_data/sem_segm/2007_000033.jpg")
|
||||
|
||||
# obtain OpenCV DNN predictions
|
||||
opencv_prediction = get_opencv_dnn_prediction(opencv_net, input_img)
|
||||
|
||||
# obtain original PyTorch ResNet50 predictions
|
||||
pytorch_prediction = get_pytorch_dnn_prediction(original_model, input_img)
|
||||
|
||||
pascal_voc_classes, pascal_voc_colors = read_colors_info("test_data/sem_segm/pascal-classes.txt")
|
||||
|
||||
# obtain colored segmentation masks
|
||||
opencv_colored_mask = get_colored_mask(img.shape, opencv_prediction, pascal_voc_colors)
|
||||
pytorch_colored_mask = get_colored_mask(img.shape, pytorch_prediction, pascal_voc_colors)
|
||||
|
||||
# obtain palette of PASCAL VOC colors
|
||||
color_legend = get_legend(pascal_voc_classes, pascal_voc_colors)
|
||||
|
||||
cv2.imshow('PyTorch Colored Mask', pytorch_colored_mask)
|
||||
cv2.imshow('OpenCV DNN Colored Mask', opencv_colored_mask)
|
||||
cv2.imshow('Color Legend', color_legend)
|
||||
|
||||
cv2.waitKey(0)
|
||||
```
|
||||
|
||||
To provide the model inference we will use the below picture from the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) validation dataset:
|
||||
|
||||

|
||||
|
||||
The target segmented result is:
|
||||
|
||||

|
||||
|
||||
For the PASCAL VOC colors decoding and its mapping with the predicted masks, we also need ``pascal-classes.txt`` file, which contains the full list of the PASCAL VOC classes and corresponding colors.
|
||||
|
||||
Let's go deeper into each code step by the example of pretrained PyTorch FCN ResNet-50:
|
||||
* instantiate PyTorch FCN ResNet-50 model:
|
||||
|
||||
```python
|
||||
# initialize PyTorch FCN ResNet-50 model
|
||||
original_model = models.segmentation.fcn_resnet50(pretrained=True)
|
||||
```
|
||||
|
||||
* convert PyTorch model into ONNX format:
|
||||
|
||||
```python
|
||||
# define the directory for further converted model save
|
||||
onnx_model_path = "models"
|
||||
# define the name of further converted model
|
||||
onnx_model_name = "fcnresnet50.onnx"
|
||||
|
||||
# create directory for further converted model
|
||||
os.makedirs(onnx_model_path, exist_ok=True)
|
||||
|
||||
# get full path to the converted model
|
||||
full_model_path = os.path.join(onnx_model_path, onnx_model_name)
|
||||
|
||||
# generate model input to build the graph
|
||||
generated_input = Variable(
|
||||
torch.randn(1, 3, 500, 500)
|
||||
)
|
||||
|
||||
# model export into ONNX format
|
||||
torch.onnx.export(
|
||||
original_model,
|
||||
generated_input,
|
||||
full_model_path,
|
||||
verbose=True,
|
||||
input_names=["input"],
|
||||
output_names=["output"],
|
||||
opset_version=11
|
||||
)
|
||||
```
|
||||
|
||||
The code from this step does not differ from the classification conversion case. Thus, after the successful execution of the above code, we will get ``models/fcnresnet50.onnx``.
|
||||
|
||||
* read the transferred network with cv.dnn.readNetFromONNX passing the obtained in the previous step ONNX model into it:
|
||||
|
||||
```python
|
||||
# read converted .onnx model with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
|
||||
```
|
||||
|
||||
* prepare input data:
|
||||
|
||||
```python
|
||||
# read the image
|
||||
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
|
||||
input_img = input_img.astype(np.float32)
|
||||
|
||||
# target image sizes
|
||||
img_height = input_img.shape[0]
|
||||
img_width = input_img.shape[1]
|
||||
|
||||
# define preprocess parameters
|
||||
mean = np.array([0.485, 0.456, 0.406]) * 255.0
|
||||
scale = 1 / 255.0
|
||||
std = [0.229, 0.224, 0.225]
|
||||
|
||||
# prepare input blob to fit the model input:
|
||||
# 1. subtract mean
|
||||
# 2. scale to set pixel values from 0 to 1
|
||||
input_blob = cv2.dnn.blobFromImage(
|
||||
image=input_img,
|
||||
scalefactor=scale,
|
||||
size=(img_width, img_height), # img target size
|
||||
mean=mean,
|
||||
swapRB=True, # BGR -> RGB
|
||||
crop=False # center crop
|
||||
)
|
||||
# 3. divide by std
|
||||
input_blob[0] /= np.asarray(std, dtype=np.float32).reshape(3, 1, 1)
|
||||
```
|
||||
|
||||
In this step we read the image and prepare model input with cv2.dnn.blobFromImage function, which returns 4-dimensional blob.
|
||||
It should be noted that firstly in ``cv2.dnn.blobFromImage`` mean value is subtracted and only then pixel values are scaled. Thus, ``mean`` is multiplied by ``255.0`` to reproduce the original image preprocessing order:
|
||||
|
||||
```python
|
||||
img /= 255.0
|
||||
img -= [0.485, 0.456, 0.406]
|
||||
img /= [0.229, 0.224, 0.225]
|
||||
```
|
||||
|
||||
* OpenCV ``cv.dnn_Net`` inference:
|
||||
|
||||
```python
|
||||
# set OpenCV DNN input
|
||||
opencv_net.setInput(preproc_img)
|
||||
|
||||
# OpenCV DNN inference
|
||||
out = opencv_net.forward()
|
||||
print("OpenCV DNN segmentation prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get IDs of predicted classes
|
||||
out_predictions = np.argmax(out[0], axis=0)
|
||||
```
|
||||
|
||||
After the above code execution we will get the following output:
|
||||
|
||||
```
|
||||
OpenCV DNN segmentation prediction:
|
||||
* shape: (1, 21, 500, 500)
|
||||
```
|
||||
|
||||
Each prediction channel out of 21, where 21 represents the number of PASCAL VOC classes, contains probabilities, which indicate how likely the pixel corresponds to the PASCAL VOC class.
|
||||
|
||||
* PyTorch FCN ResNet-50 model inference:
|
||||
|
||||
```python
|
||||
original_net.eval()
|
||||
preproc_img = torch.FloatTensor(preproc_img)
|
||||
|
||||
with torch.no_grad():
|
||||
# obtaining unnormalized probabilities for each class
|
||||
out = original_net(preproc_img)['out']
|
||||
|
||||
print("\nPyTorch segmentation model prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get IDs of predicted classes
|
||||
out_predictions = out[0].argmax(dim=0)
|
||||
```
|
||||
|
||||
After the above code launching we will get the following output:
|
||||
|
||||
```
|
||||
PyTorch segmentation model prediction:
|
||||
* shape: torch.Size([1, 21, 366, 500])
|
||||
```
|
||||
|
||||
PyTorch prediction also contains probabilities corresponding to each class prediction.
|
||||
|
||||
* get colored masks from predictions:
|
||||
|
||||
```python
|
||||
# convert mask values into PASCAL VOC colors
|
||||
processed_mask = np.stack([colors[color_id] for color_id in segm_mask.flatten()])
|
||||
|
||||
# reshape mask into 3-channel image
|
||||
processed_mask = processed_mask.reshape(mask_height, mask_width, 3)
|
||||
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
|
||||
np.uint8)
|
||||
|
||||
# convert colored mask from BGR to RGB for compatibility with PASCAL VOC colors
|
||||
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
|
||||
```
|
||||
|
||||
In this step we map the probabilities from segmentation masks with appropriate colors of the predicted classes. Let's have a look at the results:
|
||||
|
||||

|
||||
|
||||
For the extended evaluation of the models, we can use ``py_to_py_segm`` script of the ``dnn_model_runner`` module. This module part will be described in the next subchapter.
|
||||
|
||||
### Evaluation of the Models
|
||||
|
||||
The proposed in ``dnn/samples`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the PASCAL VOC dataset and test execution for the following PyTorch segmentation models:
|
||||
* FCN ResNet-50
|
||||
* FCN ResNet-101
|
||||
|
||||
This list can be also extended with further appropriate evaluation pipeline configuration.
|
||||
|
||||
#### Evaluation Mode
|
||||
|
||||
The below line represents running of the module in the evaluation mode:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name <pytorch_segm_model_name>
|
||||
```
|
||||
|
||||
Chosen from the list segmentation model will be read into OpenCV ``cv.dnn_Net`` object. Evaluation results of PyTorch and OpenCV models (pixel accuracy, mean IoU, inference time) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
|
||||
|
||||
Necessary evaluation configurations are defined in the [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py):
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestSegmConfig:
|
||||
frame_size: int = 500
|
||||
img_root_dir: str = "./VOC2012"
|
||||
img_dir: str = os.path.join(img_root_dir, "JPEGImages/")
|
||||
img_segm_gt_dir: str = os.path.join(img_root_dir, "SegmentationClass/")
|
||||
# reduced val: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt
|
||||
segm_val_file: str = os.path.join(img_root_dir, "ImageSets/Segmentation/seg11valid.txt")
|
||||
colour_file_cls: str = os.path.join(img_root_dir, "ImageSets/Segmentation/pascal-classes.txt")
|
||||
```
|
||||
|
||||
These values can be modified in accordance with chosen model pipeline.
|
||||
|
||||
To initiate the evaluation of the PyTorch FCN ResNet-50, run the following line:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name fcnresnet50
|
||||
```
|
||||
|
||||
#### Test Mode
|
||||
|
||||
The below line represents running of the module in the test mode, which provides the steps for the model inference:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name <pytorch_segm_model_name> --test True --default_img_preprocess <True/False> --evaluate False
|
||||
```
|
||||
|
||||
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
|
||||
|
||||
Test configuration is represented in [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestSegmModuleConfig`` class:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestSegmModuleConfig:
|
||||
segm_test_data_dir: str = "test_data/sem_segm"
|
||||
test_module_name: str = "segmentation"
|
||||
test_module_path: str = "segmentation.py"
|
||||
input_img: str = os.path.join(segm_test_data_dir, "2007_000033.jpg")
|
||||
model: str = ""
|
||||
|
||||
frame_height: str = str(TestSegmConfig.frame_size)
|
||||
frame_width: str = str(TestSegmConfig.frame_size)
|
||||
scale: float = 1.0
|
||||
mean: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
|
||||
std: List[float] = field(default_factory=list)
|
||||
crop: bool = False
|
||||
rgb: bool = True
|
||||
classes: str = os.path.join(segm_test_data_dir, "pascal-classes.txt")
|
||||
```
|
||||
|
||||
The default image preprocessing options are defined in ``default_preprocess_config.py``:
|
||||
|
||||
```python
|
||||
pytorch_segm_input_blob = {
|
||||
"mean": ["123.675", "116.28", "103.53"],
|
||||
"scale": str(1 / 255.0),
|
||||
"std": ["0.229", "0.224", "0.225"],
|
||||
"crop": "False",
|
||||
"rgb": "True"
|
||||
}
|
||||
```
|
||||
|
||||
The basis of the model testing is represented in ``samples/dnn/segmentation.py``. ``segmentation.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for ``cv2.dnn.blobFromImage``.
|
||||
|
||||
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name fcnresnet50 --test True --default_img_preprocess True --evaluate False
|
||||
```
|
||||
@@ -0,0 +1,406 @@
|
||||
# Conversion of TensorFlow Segmentation Models and Launch with OpenCV {#tf_segm_tutorial_dnn_conversion}
|
||||
|
||||
## Goals
|
||||
In this tutorial you will learn how to:
|
||||
* convert TensorFlow (TF) segmentation models
|
||||
* run converted TensorFlow model with OpenCV
|
||||
* obtain an evaluation of the TensorFlow and OpenCV DNN models
|
||||
|
||||
We will explore the above-listed points by the example of the DeepLab architecture.
|
||||
|
||||
## Introduction
|
||||
The key concepts involved in the transition pipeline of the [TensorFlow classification](https://link_to_cls_tutorial) and segmentation models with OpenCV API are almost equal excepting the phase of graph optimization. The initial step in conversion of TensorFlow models into cv.dnn.Net
|
||||
is obtaining the frozen TF model graph. Frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. Usually the frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (```.pb```) files.
|
||||
To read the generated segmentation model ``.pb`` file with cv.dnn.readNetFromTensorflow, it is needed to modify the graph with TF [graph transform tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms).
|
||||
|
||||
## Practice
|
||||
In this part we are going to cover the following points:
|
||||
1. create a TF classification model conversion pipeline and provide the inference
|
||||
2. evaluate and test TF classification models
|
||||
|
||||
If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" tutorial part can be skipped.
|
||||
|
||||
### Model Conversion Pipeline
|
||||
The code in this subchapter is located in the ``dnn_model_runner`` module and can be executed with the line:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_deeplab
|
||||
```
|
||||
|
||||
TensorFlow segmentation models can be found in [TensorFlow Research Models](https://github.com/tensorflow/models/tree/master/research/#tensorflow-research-models) section, which contains the implementations of models on the basis of published research papers.
|
||||
We will retrieve the archive with the pre-trained TF DeepLabV3 from the below link:
|
||||
|
||||
```
|
||||
http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz
|
||||
```
|
||||
|
||||
The full frozen graph obtaining pipeline is described in ``deeplab_retrievement.py``:
|
||||
|
||||
```python
|
||||
def get_deeplab_frozen_graph():
|
||||
# define model path to download
|
||||
models_url = 'http://download.tensorflow.org/models/'
|
||||
mobilenetv2_voctrainval = 'deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz'
|
||||
|
||||
# construct model link to download
|
||||
model_link = models_url + mobilenetv2_voctrainval
|
||||
|
||||
try:
|
||||
urllib.request.urlretrieve(model_link, mobilenetv2_voctrainval)
|
||||
except Exception:
|
||||
print("TF DeepLabV3 was not retrieved: {}".format(model_link))
|
||||
return
|
||||
|
||||
tf_model_tar = tarfile.open(mobilenetv2_voctrainval)
|
||||
|
||||
# iterate the obtained model archive
|
||||
for model_tar_elem in tf_model_tar.getmembers():
|
||||
# check whether the model archive contains frozen graph
|
||||
if TF_FROZEN_GRAPH_NAME in os.path.basename(model_tar_elem.name):
|
||||
# extract frozen graph
|
||||
tf_model_tar.extract(model_tar_elem, FROZEN_GRAPH_PATH)
|
||||
|
||||
tf_model_tar.close()
|
||||
```
|
||||
|
||||
After running this script:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.tf.segmentation.deeplab_retrievement
|
||||
```
|
||||
|
||||
we will get ``frozen_inference_graph.pb`` in ``deeplab/deeplabv3_mnv2_pascal_trainval``.
|
||||
|
||||
Before going to the network loading with OpenCV it is needed to optimize the extracted ``frozen_inference_graph.pb``.
|
||||
To optimize the graph we use TF ``TransformGraph`` with default parameters:
|
||||
|
||||
```python
|
||||
DEFAULT_OPT_GRAPH_NAME = "optimized_frozen_inference_graph.pb"
|
||||
DEFAULT_INPUTS = "sub_7"
|
||||
DEFAULT_OUTPUTS = "ResizeBilinear_3"
|
||||
DEFAULT_TRANSFORMS = "remove_nodes(op=Identity)" \
|
||||
" merge_duplicate_nodes" \
|
||||
" strip_unused_nodes" \
|
||||
" fold_constants(ignore_errors=true)" \
|
||||
" fold_batch_norms" \
|
||||
" fold_old_batch_norms"
|
||||
|
||||
|
||||
def optimize_tf_graph(
|
||||
in_graph,
|
||||
out_graph=DEFAULT_OPT_GRAPH_NAME,
|
||||
inputs=DEFAULT_INPUTS,
|
||||
outputs=DEFAULT_OUTPUTS,
|
||||
transforms=DEFAULT_TRANSFORMS,
|
||||
is_manual=True,
|
||||
was_optimized=True
|
||||
):
|
||||
# ...
|
||||
|
||||
tf_opt_graph = TransformGraph(
|
||||
tf_graph,
|
||||
inputs,
|
||||
outputs,
|
||||
transforms
|
||||
)
|
||||
```
|
||||
|
||||
To run graph optimization process, execute the line:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.tf.segmentation.tf_graph_optimizer --in_graph deeplab/deeplabv3_mnv2_pascal_trainval/frozen_inference_graph.pb
|
||||
```
|
||||
|
||||
As a result ``deeplab/deeplabv3_mnv2_pascal_trainval`` directory will contain ``optimized_frozen_inference_graph.pb``.
|
||||
|
||||
After we have obtained the model graphs, let's examine the below-listed steps:
|
||||
1. read TF ``frozen_inference_graph.pb`` graph
|
||||
2. read optimized TF frozen graph with OpenCV API
|
||||
3. prepare input data
|
||||
4. provide inference
|
||||
5. get colored masks from predictions
|
||||
6. visualize results
|
||||
|
||||
```python
|
||||
# get TF model graph from the obtained frozen graph
|
||||
deeplab_graph = read_deeplab_frozen_graph(deeplab_frozen_graph_path)
|
||||
|
||||
# read DeepLab frozen graph with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromTensorflow(opt_deeplab_frozen_graph_path)
|
||||
print("OpenCV model was successfully read. Model layers: \n", opencv_net.getLayerNames())
|
||||
|
||||
# get processed image
|
||||
original_img_shape, tf_input_blob, opencv_input_img = get_processed_imgs("test_data/sem_segm/2007_000033.jpg")
|
||||
|
||||
# obtain OpenCV DNN predictions
|
||||
opencv_prediction = get_opencv_dnn_prediction(opencv_net, opencv_input_img)
|
||||
|
||||
# obtain TF model predictions
|
||||
tf_prediction = get_tf_dnn_prediction(deeplab_graph, tf_input_blob)
|
||||
|
||||
# get PASCAL VOC classes and colors
|
||||
pascal_voc_classes, pascal_voc_colors = read_colors_info("test_data/sem_segm/pascal-classes.txt")
|
||||
|
||||
# obtain colored segmentation masks
|
||||
opencv_colored_mask = get_colored_mask(original_img_shape, opencv_prediction, pascal_voc_colors)
|
||||
tf_colored_mask = get_tf_colored_mask(original_img_shape, tf_prediction, pascal_voc_colors)
|
||||
|
||||
# obtain palette of PASCAL VOC colors
|
||||
color_legend = get_legend(pascal_voc_classes, pascal_voc_colors)
|
||||
|
||||
cv2.imshow('TensorFlow Colored Mask', tf_colored_mask)
|
||||
cv2.imshow('OpenCV DNN Colored Mask', opencv_colored_mask)
|
||||
|
||||
cv2.imshow('Color Legend', color_legend)
|
||||
```
|
||||
|
||||
To provide the model inference we will use the below picture from the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) validation dataset:
|
||||
|
||||

|
||||
|
||||
The target segmented result is:
|
||||
|
||||

|
||||
|
||||
For the PASCAL VOC colors decoding and its mapping with the predicted masks, we also need ``pascal-classes.txt`` file, which contains the full list of the PASCAL VOC classes and corresponding colors.
|
||||
|
||||
Let's go deeper into each step by the example of pretrained TF DeepLabV3 MobileNetV2:
|
||||
|
||||
* read TF ``frozen_inference_graph.pb`` graph :
|
||||
|
||||
```python
|
||||
# init deeplab model graph
|
||||
model_graph = tf.Graph()
|
||||
|
||||
# obtain
|
||||
with tf.io.gfile.GFile(frozen_graph_path, 'rb') as graph_file:
|
||||
tf_model_graph = GraphDef()
|
||||
tf_model_graph.ParseFromString(graph_file.read())
|
||||
|
||||
with model_graph.as_default():
|
||||
tf.import_graph_def(tf_model_graph, name='')
|
||||
```
|
||||
|
||||
* read optimized TF frozen graph with OpenCV API:
|
||||
|
||||
```python
|
||||
# read DeepLab frozen graph with OpenCV API
|
||||
opencv_net = cv2.dnn.readNetFromTensorflow(opt_deeplab_frozen_graph_path)
|
||||
```
|
||||
|
||||
* prepare input data with cv2.dnn.blobFromImage function:
|
||||
|
||||
```python
|
||||
# read the image
|
||||
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
|
||||
input_img = input_img.astype(np.float32)
|
||||
|
||||
# preprocess image for TF model input
|
||||
tf_preproc_img = cv2.resize(input_img, (513, 513))
|
||||
tf_preproc_img = cv2.cvtColor(tf_preproc_img, cv2.COLOR_BGR2RGB)
|
||||
|
||||
# define preprocess parameters for OpenCV DNN
|
||||
mean = np.array([1.0, 1.0, 1.0]) * 127.5
|
||||
scale = 1 / 127.5
|
||||
|
||||
# prepare input blob to fit the model input:
|
||||
# 1. subtract mean
|
||||
# 2. scale to set pixel values from 0 to 1
|
||||
input_blob = cv2.dnn.blobFromImage(
|
||||
image=input_img,
|
||||
scalefactor=scale,
|
||||
size=(513, 513), # img target size
|
||||
mean=mean,
|
||||
swapRB=True, # BGR -> RGB
|
||||
crop=False # center crop
|
||||
)
|
||||
```
|
||||
|
||||
Please, pay attention at the preprocessing order in the ``cv2.dnn.blobFromImage`` function. Firstly, the mean value is subtracted and only then pixel values are multiplied by the defined scale.
|
||||
Therefore, to reproduce TF image preprocessing pipeline, we multiply ``mean`` by ``127.5``.
|
||||
Another important point is image preprocessing for TF DeepLab. To pass the image into TF model we need only to construct an appropriate shape, the rest image preprocessing is described in [feature_extractor.py](https://github.com/tensorflow/models/blob/master/research/deeplab/core/feature_extractor.py) and will be invoked automatically.
|
||||
|
||||
* provide OpenCV ``cv.dnn_Net`` inference:
|
||||
|
||||
```python
|
||||
# set OpenCV DNN input
|
||||
opencv_net.setInput(preproc_img)
|
||||
|
||||
# OpenCV DNN inference
|
||||
out = opencv_net.forward()
|
||||
print("OpenCV DNN segmentation prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
|
||||
# get IDs of predicted classes
|
||||
out_predictions = np.argmax(out[0], axis=0)
|
||||
```
|
||||
|
||||
After the above code execution we will get the following output:
|
||||
|
||||
```
|
||||
OpenCV DNN segmentation prediction:
|
||||
* shape: (1, 21, 513, 513)
|
||||
|
||||
```
|
||||
|
||||
Each prediction channel out of 21, where 21 represents the number of PASCAL VOC classes, contains probabilities, which indicate how likely the pixel corresponds to the PASCAL VOC class.
|
||||
|
||||
* provide TF model inference:
|
||||
|
||||
```python
|
||||
preproc_img = np.expand_dims(preproc_img, 0)
|
||||
|
||||
# init TF session
|
||||
tf_session = Session(graph=model_graph)
|
||||
|
||||
input_tensor_name = "ImageTensor:0",
|
||||
output_tensor_name = "SemanticPredictions:0"
|
||||
|
||||
# run inference
|
||||
out = tf_session.run(
|
||||
output_tensor_name,
|
||||
feed_dict={input_tensor_name: [preproc_img]}
|
||||
)
|
||||
|
||||
print("TF segmentation model prediction: \n")
|
||||
print("* shape: ", out.shape)
|
||||
```
|
||||
|
||||
TF inference results are the following:
|
||||
|
||||
```
|
||||
TF segmentation model prediction:
|
||||
* shape: (1, 513, 513)
|
||||
```
|
||||
|
||||
TensorFlow prediction contains the indexes of corresponding PASCAL VOC classes.
|
||||
|
||||
* transform OpenCV prediction into colored mask:
|
||||
|
||||
```python
|
||||
mask_height = segm_mask.shape[0]
|
||||
mask_width = segm_mask.shape[1]
|
||||
|
||||
img_height = original_img_shape[0]
|
||||
img_width = original_img_shape[1]
|
||||
|
||||
# convert mask values into PASCAL VOC colors
|
||||
processed_mask = np.stack([colors[color_id] for color_id in segm_mask.flatten()])
|
||||
|
||||
# reshape mask into 3-channel image
|
||||
processed_mask = processed_mask.reshape(mask_height, mask_width, 3)
|
||||
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
|
||||
np.uint8)
|
||||
|
||||
# convert colored mask from BGR to RGB
|
||||
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
|
||||
```
|
||||
|
||||
In this step we map the probabilities from segmentation masks with appropriate colors of the predicted classes. Let's have a look at the results:
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
* transform TF prediction into colored mask:
|
||||
|
||||
```python
|
||||
colors = np.array(colors)
|
||||
processed_mask = colors[segm_mask[0]]
|
||||
|
||||
img_height = original_img_shape[0]
|
||||
img_width = original_img_shape[1]
|
||||
|
||||
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
|
||||
np.uint8)
|
||||
|
||||
# convert colored mask from BGR to RGB for compatibility with PASCAL VOC colors
|
||||
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
|
||||
```
|
||||
|
||||
The result is:
|
||||
|
||||

|
||||
|
||||
As a result, we get two equal segmentation masks.
|
||||
|
||||
### Evaluation of the Models
|
||||
|
||||
The proposed in ``dnn/samples`` ``dnn_model_runner`` module allows to run the full evaluation pipeline on the PASCAL VOC dataset and test execution for the DeepLab MobileNet model.
|
||||
|
||||
#### Evaluation Mode
|
||||
|
||||
To below line represents running of the module in the evaluation mode:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm
|
||||
```
|
||||
|
||||
The model will be read into OpenCV ``cv.dnn_Net`` object. Evaluation results of TF and OpenCV models (pixel accuracy, mean IoU, inference time) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.
|
||||
|
||||
Necessary evaluation configurations are defined in the [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py):
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestSegmConfig:
|
||||
frame_size: int = 500
|
||||
img_root_dir: str = "./VOC2012"
|
||||
img_dir: str = os.path.join(img_root_dir, "JPEGImages/")
|
||||
img_segm_gt_dir: str = os.path.join(img_root_dir, "SegmentationClass/")
|
||||
# reduced val: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt
|
||||
segm_val_file: str = os.path.join(img_root_dir, "ImageSets/Segmentation/seg11valid.txt")
|
||||
colour_file_cls: str = os.path.join(img_root_dir, "ImageSets/Segmentation/pascal-classes.txt")
|
||||
```
|
||||
|
||||
These values can be modified in accordance with chosen model pipeline.
|
||||
|
||||
#### Test Mode
|
||||
|
||||
The below line represents running of the module in the test mode, which provides the steps for the model inference:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm --test True --default_img_preprocess <True/False> --evaluate False
|
||||
```
|
||||
|
||||
Here ``default_img_preprocess`` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, ``scale``, ``mean`` or ``std``.
|
||||
|
||||
Test configuration is represented in [``test_config.py``](https://github.com/opencv/opencv/tree/master/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) ``TestSegmModuleConfig`` class:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TestSegmModuleConfig:
|
||||
segm_test_data_dir: str = "test_data/sem_segm"
|
||||
test_module_name: str = "segmentation"
|
||||
test_module_path: str = "segmentation.py"
|
||||
input_img: str = os.path.join(segm_test_data_dir, "2007_000033.jpg")
|
||||
model: str = ""
|
||||
|
||||
frame_height: str = str(TestSegmConfig.frame_size)
|
||||
frame_width: str = str(TestSegmConfig.frame_size)
|
||||
scale: float = 1.0
|
||||
mean: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
|
||||
std: List[float] = field(default_factory=list)
|
||||
crop: bool = False
|
||||
rgb: bool = True
|
||||
classes: str = os.path.join(segm_test_data_dir, "pascal-classes.txt")
|
||||
```
|
||||
|
||||
The default image preprocessing options are defined in ``default_preprocess_config.py``:
|
||||
|
||||
```python
|
||||
tf_segm_input_blob = {
|
||||
"scale": str(1 / 127.5),
|
||||
"mean": ["127.5", "127.5", "127.5"],
|
||||
"std": [],
|
||||
"crop": "False",
|
||||
"rgb": "True"
|
||||
}
|
||||
```
|
||||
|
||||
The basis of the model testing is represented in ``samples/dnn/segmentation.py``. ``segmentation.py`` can be executed autonomously with provided converted model in ``--input`` and populated parameters for ``cv2.dnn.blobFromImage``.
|
||||
|
||||
To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with ``dnn_model_runner`` execute the below line:
|
||||
|
||||
```
|
||||
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm --test True --default_img_preprocess True --evaluate False
|
||||
```
|
||||
BIN
doc/tutorials/dnn/dnn_text_spotting/detect_test1.jpg
Normal file
|
After Width: | Height: | Size: 40 KiB |
BIN
doc/tutorials/dnn/dnn_text_spotting/detect_test2.jpg
Normal file
|
After Width: | Height: | Size: 90 KiB |
324
doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown
Normal file
@@ -0,0 +1,324 @@
|
||||
# High Level API: TextDetectionModel and TextRecognitionModel {#tutorial_dnn_text_spotting}
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_OCR}
|
||||
@next_tutorial{pytorch_cls_tutorial_dnn_conversion}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Wenqing Zhang |
|
||||
| Compatibility | OpenCV >= 4.5 |
|
||||
|
||||
## Introduction
|
||||
In this tutorial, we will introduce the APIs for TextRecognitionModel and TextDetectionModel in detail.
|
||||
|
||||
---
|
||||
#### TextRecognitionModel:
|
||||
|
||||
In the current version, @ref cv::dnn::TextRecognitionModel only supports CNN+RNN+CTC based algorithms,
|
||||
and the greedy decoding method for CTC is provided.
|
||||
For more information, please refer to the [original paper](https://arxiv.org/abs/1507.05717)
|
||||
|
||||
Before recognition, you should `setVocabulary` and `setDecodeType`.
|
||||
- "CTC-greedy", the output of the text recognition model should be a probability matrix.
|
||||
The shape should be `(T, B, Dim)`, where
|
||||
- `T` is the sequence length
|
||||
- `B` is the batch size (only support `B=1` in inference)
|
||||
- and `Dim` is the length of vocabulary +1('Blank' of CTC is at the index=0 of Dim).
|
||||
|
||||
@ref cv::dnn::TextRecognitionModel::recognize() is the main function for text recognition.
|
||||
- The input image should be a cropped text image or an image with `roiRects`
|
||||
- Other decoding methods may supported in the future
|
||||
|
||||
---
|
||||
|
||||
#### TextDetectionModel:
|
||||
|
||||
@ref cv::dnn::TextDetectionModel API provides these methods for text detection:
|
||||
- cv::dnn::TextDetectionModel::detect() returns the results in std::vector<std::vector<Point>> (4-points quadrangles)
|
||||
- cv::dnn::TextDetectionModel::detectTextRectangles() returns the results in std::vector<cv::RotatedRect> (RBOX-like)
|
||||
|
||||
In the current version, @ref cv::dnn::TextDetectionModel supports these algorithms:
|
||||
- use @ref cv::dnn::TextDetectionModel_DB with "DB" models
|
||||
- and use @ref cv::dnn::TextDetectionModel_EAST with "EAST" models
|
||||
|
||||
The following provided pretrained models are variants of DB (w/o deformable convolution),
|
||||
and the performance can be referred to the Table.1 in the [paper]((https://arxiv.org/abs/1911.08947)).
|
||||
For more information, please refer to the [official code](https://github.com/MhLiao/DB)
|
||||
|
||||
---
|
||||
|
||||
You can train your own model with more data, and convert it into ONNX format.
|
||||
We encourage you to add new algorithms to these APIs.
|
||||
|
||||
|
||||
## Pretrained Models
|
||||
|
||||
#### TextRecognitionModel:
|
||||
|
||||
```
|
||||
crnn.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=1ooaLR-rkTl8jdpGy1DoQs0-X0lQsB6Fj
|
||||
sha: 270d92c9ccb670ada2459a25977e8deeaf8380d3,
|
||||
alphabet_36.txt: https://drive.google.com/uc?export=dowload&id=1oPOYx5rQRp8L6XQciUwmwhMCfX0KyO4b
|
||||
parameter setting: -rgb=0;
|
||||
description: The classification number of this model is 36 (0~9 + a~z).
|
||||
The training dataset is MJSynth.
|
||||
|
||||
crnn_cs.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=12diBsVJrS9ZEl6BNUiRp9s0xPALBS7kt
|
||||
sha: a641e9c57a5147546f7a2dbea4fd322b47197cd5
|
||||
alphabet_94.txt: https://drive.google.com/uc?export=dowload&id=1oKXxXKusquimp7XY1mFvj9nwLzldVgBR
|
||||
parameter setting: -rgb=1;
|
||||
description: The classification number of this model is 94 (0~9 + a~z + A~Z + punctuations).
|
||||
The training datasets are MJsynth and SynthText.
|
||||
|
||||
crnn_cs_CN.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=1is4eYEUKH7HR7Gl37Sw4WPXx6Ir8oQEG
|
||||
sha: 3940942b85761c7f240494cf662dcbf05dc00d14
|
||||
alphabet_3944.txt: https://drive.google.com/uc?export=dowload&id=18IZUUdNzJ44heWTndDO6NNfIpJMmN-ul
|
||||
parameter setting: -rgb=1;
|
||||
description: The classification number of this model is 3944 (0~9 + a~z + A~Z + Chinese characters + special characters).
|
||||
The training dataset is ReCTS (https://rrc.cvc.uab.es/?ch=12).
|
||||
```
|
||||
|
||||
More models can be found in [here](https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing),
|
||||
which are taken from [clovaai](https://github.com/clovaai/deep-text-recognition-benchmark).
|
||||
You can train more models by [CRNN](https://github.com/meijieru/crnn.pytorch), and convert models by `torch.onnx.export`.
|
||||
|
||||
#### TextDetectionModel:
|
||||
|
||||
```
|
||||
- DB_IC15_resnet50.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=17_ABp79PlFt9yPCxSaarVc_DKTmrSGGf
|
||||
sha: bef233c28947ef6ec8c663d20a2b326302421fa3
|
||||
recommended parameter setting: -inputHeight=736, -inputWidth=1280;
|
||||
description: This model is trained on ICDAR2015, so it can only detect English text instances.
|
||||
|
||||
- DB_IC15_resnet18.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=1sZszH3pEt8hliyBlTmB-iulxHP1dCQWV
|
||||
sha: 19543ce09b2efd35f49705c235cc46d0e22df30b
|
||||
recommended parameter setting: -inputHeight=736, -inputWidth=1280;
|
||||
description: This model is trained on ICDAR2015, so it can only detect English text instances.
|
||||
|
||||
- DB_TD500_resnet50.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=19YWhArrNccaoSza0CfkXlA8im4-lAGsR
|
||||
sha: 1b4dd21a6baa5e3523156776970895bd3db6960a
|
||||
recommended parameter setting: -inputHeight=736, -inputWidth=736;
|
||||
description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances.
|
||||
|
||||
- DB_TD500_resnet18.onnx:
|
||||
url: https://drive.google.com/uc?export=dowload&id=1vY_KsDZZZb_svd5RT6pjyI8BS1nPbBSX
|
||||
sha: 8a3700bdc13e00336a815fc7afff5dcc1ce08546
|
||||
recommended parameter setting: -inputHeight=736, -inputWidth=736;
|
||||
description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances.
|
||||
|
||||
```
|
||||
|
||||
We will release more models of DB [here](https://drive.google.com/drive/folders/1qzNCHfUJOS0NEUOIKn69eCtxdlNPpWbq?usp=sharing) in the future.
|
||||
|
||||
```
|
||||
- EAST:
|
||||
Download link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
|
||||
This model is based on https://github.com/argman/EAST
|
||||
```
|
||||
|
||||
## Images for Testing
|
||||
|
||||
```
|
||||
Text Recognition:
|
||||
url: https://drive.google.com/uc?export=dowload&id=1nMcEy68zDNpIlqAn6xCk_kYcUTIeSOtN
|
||||
sha: 89205612ce8dd2251effa16609342b69bff67ca3
|
||||
|
||||
Text Detection:
|
||||
url: https://drive.google.com/uc?export=dowload&id=149tAhIcvfCYeyufRoZ9tmc2mZDKE_XrF
|
||||
sha: ced3c03fb7f8d9608169a913acf7e7b93e07109b
|
||||
```
|
||||
|
||||
## Example for Text Recognition
|
||||
|
||||
Step1. Loading images and models with a vocabulary
|
||||
|
||||
```cpp
|
||||
// Load a cropped text line image
|
||||
// you can find cropped images for testing in "Images for Testing"
|
||||
int rgb = IMREAD_COLOR; // This should be changed according to the model input requirement.
|
||||
Mat image = imread("path/to/text_rec_test.png", rgb);
|
||||
|
||||
// Load models weights
|
||||
TextRecognitionModel model("path/to/crnn_cs.onnx");
|
||||
|
||||
// The decoding method
|
||||
// more methods will be supported in future
|
||||
model.setDecodeType("CTC-greedy");
|
||||
|
||||
// Load vocabulary
|
||||
// vocabulary should be changed according to the text recognition model
|
||||
std::ifstream vocFile;
|
||||
vocFile.open("path/to/alphabet_94.txt");
|
||||
CV_Assert(vocFile.is_open());
|
||||
String vocLine;
|
||||
std::vector<String> vocabulary;
|
||||
while (std::getline(vocFile, vocLine)) {
|
||||
vocabulary.push_back(vocLine);
|
||||
}
|
||||
model.setVocabulary(vocabulary);
|
||||
```
|
||||
|
||||
Step2. Setting Parameters
|
||||
|
||||
```cpp
|
||||
// Normalization parameters
|
||||
double scale = 1.0 / 127.5;
|
||||
Scalar mean = Scalar(127.5, 127.5, 127.5);
|
||||
|
||||
// The input shape
|
||||
Size inputSize = Size(100, 32);
|
||||
|
||||
model.setInputParams(scale, inputSize, mean);
|
||||
```
|
||||
Step3. Inference
|
||||
```cpp
|
||||
std::string recognitionResult = recognizer.recognize(image);
|
||||
std::cout << "'" << recognitionResult << "'" << std::endl;
|
||||
```
|
||||
|
||||
Input image:
|
||||
|
||||

|
||||
|
||||
Output:
|
||||
```
|
||||
'welcome'
|
||||
```
|
||||
|
||||
|
||||
## Example for Text Detection
|
||||
|
||||
Step1. Loading images and models
|
||||
```cpp
|
||||
// Load an image
|
||||
// you can find some images for testing in "Images for Testing"
|
||||
Mat frame = imread("/path/to/text_det_test.png");
|
||||
```
|
||||
|
||||
Step2.a Setting Parameters (DB)
|
||||
```cpp
|
||||
// Load model weights
|
||||
TextDetectionModel_DB model("/path/to/DB_TD500_resnet50.onnx");
|
||||
|
||||
// Post-processing parameters
|
||||
float binThresh = 0.3;
|
||||
float polyThresh = 0.5;
|
||||
uint maxCandidates = 200;
|
||||
double unclipRatio = 2.0;
|
||||
model.setBinaryThreshold(binThresh)
|
||||
.setPolygonThreshold(polyThresh)
|
||||
.setMaxCandidates(maxCandidates)
|
||||
.setUnclipRatio(unclipRatio)
|
||||
;
|
||||
|
||||
// Normalization parameters
|
||||
double scale = 1.0 / 255.0;
|
||||
Scalar mean = Scalar(122.67891434, 116.66876762, 104.00698793);
|
||||
|
||||
// The input shape
|
||||
Size inputSize = Size(736, 736);
|
||||
|
||||
model.setInputParams(scale, inputSize, mean);
|
||||
```
|
||||
|
||||
Step2.b Setting Parameters (EAST)
|
||||
```cpp
|
||||
TextDetectionModel_EAST model("EAST.pb");
|
||||
|
||||
float confThreshold = 0.5;
|
||||
float nmsThreshold = 0.4;
|
||||
model.setConfidenceThreshold(confThresh)
|
||||
.setNMSThreshold(nmsThresh)
|
||||
;
|
||||
|
||||
double detScale = 1.0;
|
||||
Size detInputSize = Size(320, 320);
|
||||
Scalar detMean = Scalar(123.68, 116.78, 103.94);
|
||||
bool swapRB = true;
|
||||
model.setInputParams(detScale, detInputSize, detMean, swapRB);
|
||||
```
|
||||
|
||||
|
||||
Step3. Inference
|
||||
```cpp
|
||||
std::vector<std::vector<Point>> detResults;
|
||||
model.detect(detResults);
|
||||
|
||||
// Visualization
|
||||
polylines(frame, results, true, Scalar(0, 255, 0), 2);
|
||||
imshow("Text Detection", image);
|
||||
waitKey();
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||

|
||||
|
||||
## Example for Text Spotting
|
||||
|
||||
After following the steps above, it is easy to get the detection results of an input image.
|
||||
Then, you can do transformation and crop text images for recognition.
|
||||
For more information, please refer to **Detailed Sample**
|
||||
```cpp
|
||||
// Transform and Crop
|
||||
Mat cropped;
|
||||
fourPointsTransform(recInput, vertices, cropped);
|
||||
|
||||
String recResult = recognizer.recognize(cropped);
|
||||
```
|
||||
|
||||
Output Examples:
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
## Source Code
|
||||
The [source code](https://github.com/opencv/opencv/blob/master/modules/dnn/src/model.cpp)
|
||||
of these APIs can be found in the DNN module.
|
||||
|
||||
## Detailed Sample
|
||||
For more information, please refer to:
|
||||
- [samples/dnn/scene_text_recognition.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_recognition.cpp)
|
||||
- [samples/dnn/scene_text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_detection.cpp)
|
||||
- [samples/dnn/text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp)
|
||||
- [samples/dnn/scene_text_spotting.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_spotting.cpp)
|
||||
|
||||
#### Test with an image
|
||||
Examples:
|
||||
```bash
|
||||
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=/path/to/alphabet_94.txt
|
||||
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -i=path/to/an/image -ih=736 -iw=736
|
||||
example_dnn_scene_text_spotting -dmp=path/to/DB_IC15_resnet50.onnx -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -iw=1280 -ih=736 -rgb=1 -vp=/path/to/alphabet_94.txt
|
||||
example_dnn_text_detection -dmp=path/to/EAST.pb -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=path/to/alphabet_94.txt
|
||||
```
|
||||
|
||||
#### Test on public datasets
|
||||
Text Recognition:
|
||||
|
||||
The download link for testing images can be found in the **Images for Testing**
|
||||
|
||||
|
||||
Examples:
|
||||
```bash
|
||||
example_dnn_scene_text_recognition -mp=path/to/crnn.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_36.txt -rgb=0
|
||||
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_94.txt -rgb=1
|
||||
```
|
||||
|
||||
Text Detection:
|
||||
|
||||
The download links for testing images can be found in the **Images for Testing**
|
||||
|
||||
Examples:
|
||||
```bash
|
||||
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/TD500 -ih=736 -iw=736
|
||||
example_dnn_scene_text_detection -mp=path/to/DB_IC15_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/IC15 -ih=736 -iw=1280
|
||||
```
|
||||
BIN
doc/tutorials/dnn/dnn_text_spotting/text_det_test_results.jpg
Normal file
|
After Width: | Height: | Size: 48 KiB |
BIN
doc/tutorials/dnn/dnn_text_spotting/text_rec_test.png
Normal file
|
After Width: | Height: | Size: 2.8 KiB |
54
doc/tutorials/dnn/dnn_yolo/dnn_yolo.markdown
Normal file
@@ -0,0 +1,54 @@
|
||||
YOLO DNNs {#tutorial_dnn_yolo}
|
||||
===============================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_dnn_android}
|
||||
@next_tutorial{tutorial_dnn_javascript}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Alessandro de Oliveira Faria |
|
||||
| Compatibility | OpenCV >= 3.3.1 |
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
In this text you will learn how to use opencv_dnn module using yolo_object_detection (Sample of using OpenCV dnn module in real time with device capture, video and image).
|
||||
|
||||
We will demonstrate results of this example on the following picture.
|
||||

|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
VIDEO DEMO:
|
||||
@youtube{NHtRlndE2cg}
|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
Use a universal sample for object detection models written
|
||||
[in C++](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp) and
|
||||
[in Python](https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py) languages
|
||||
|
||||
Usage examples
|
||||
--------------
|
||||
|
||||
Execute in webcam:
|
||||
|
||||
@code{.bash}
|
||||
|
||||
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --rgb
|
||||
|
||||
@endcode
|
||||
|
||||
Execute with image or video file:
|
||||
|
||||
@code{.bash}
|
||||
|
||||
$ example_dnn_object_detection --config=[PATH-TO-DARKNET]/cfg/yolo.cfg --model=[PATH-TO-DARKNET]/yolo.weights --classes=object_detection_classes_pascal_voc.txt --width=416 --height=416 --scale=0.00392 --input=[PATH-TO-IMAGE-OR-VIDEO-FILE] --rgb
|
||||
|
||||
@endcode
|
||||
|
||||
Questions and suggestions email to: Alessandro de Oliveira Faria cabelo@opensuse.org or OpenCV Team.
|
||||
BIN
doc/tutorials/dnn/dnn_yolo/images/yolo.jpg
Normal file
|
After Width: | Height: | Size: 210 KiB |
BIN
doc/tutorials/dnn/images/lena_hed.jpg
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
doc/tutorials/dnn/images/space_shuttle.jpg
Normal file
|
After Width: | Height: | Size: 27 KiB |
24
doc/tutorials/dnn/table_of_content_dnn.markdown
Normal file
@@ -0,0 +1,24 @@
|
||||
Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
|
||||
=====================================
|
||||
|
||||
- @subpage tutorial_dnn_googlenet
|
||||
- @subpage tutorial_dnn_halide
|
||||
- @subpage tutorial_dnn_halide_scheduling
|
||||
- @subpage tutorial_dnn_android
|
||||
- @subpage tutorial_dnn_yolo
|
||||
- @subpage tutorial_dnn_javascript
|
||||
- @subpage tutorial_dnn_custom_layers
|
||||
- @subpage tutorial_dnn_OCR
|
||||
- @subpage tutorial_dnn_text_spotting
|
||||
|
||||
#### PyTorch models with OpenCV
|
||||
In this section you will find the guides, which describe how to run classification, segmentation and detection PyTorch DNN models with OpenCV.
|
||||
- @subpage pytorch_cls_tutorial_dnn_conversion
|
||||
- @subpage pytorch_cls_c_tutorial_dnn_conversion
|
||||
- @subpage pytorch_segm_tutorial_dnn_conversion
|
||||
|
||||
#### TensorFlow models with OpenCV
|
||||
In this section you will find the guides, which describe how to run classification, segmentation and detection TensorFlow DNN models with OpenCV.
|
||||
- @subpage tf_cls_tutorial_dnn_conversion
|
||||
- @subpage tf_det_tutorial_dnn_conversion
|
||||
- @subpage tf_segm_tutorial_dnn_conversion
|
||||