init - 初始化项目
123
doc/tutorials/core/adding_images/adding_images.markdown
Normal file
@@ -0,0 +1,123 @@
|
||||
Adding (blending) two images using OpenCV {#tutorial_adding_images}
|
||||
=========================================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_mat_operations}
|
||||
@next_tutorial{tutorial_basic_linear_transform}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Ana Huamán |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
We will learn how to blend two images!
|
||||
Goal
|
||||
----
|
||||
|
||||
In this tutorial you will learn:
|
||||
|
||||
- what is *linear blending* and why it is useful;
|
||||
- how to add two images using **addWeighted()**
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
@note
|
||||
The explanation below belongs to the book [Computer Vision: Algorithms and
|
||||
Applications](http://szeliski.org/Book/) by Richard Szeliski
|
||||
|
||||
From our previous tutorial, we know already a bit of *Pixel operators*. An interesting dyadic
|
||||
(two-input) operator is the *linear blend operator*:
|
||||
|
||||
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
|
||||
|
||||
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$ this operator can be used to perform a temporal
|
||||
*cross-dissolve* between two images or videos, as seen in slide shows and film productions (cool,
|
||||
eh?)
|
||||
|
||||
Source Code
|
||||
-----------
|
||||
|
||||
@add_toggle_cpp
|
||||
Download the source code from
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/AddingImages/AddingImages.cpp).
|
||||
@include cpp/tutorial_code/core/AddingImages/AddingImages.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
Download the source code from
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/AddingImages/AddingImages.java).
|
||||
@include java/tutorial_code/core/AddingImages/AddingImages.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
Download the source code from
|
||||
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/AddingImages/adding_images.py).
|
||||
@include python/tutorial_code/core/AddingImages/adding_images.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Since we are going to perform:
|
||||
|
||||
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
|
||||
|
||||
We need two source images (\f$f_{0}(x)\f$ and \f$f_{1}(x)\f$). So, we load them in the usual way:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/AddingImages/AddingImages.java load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/AddingImages/adding_images.py load
|
||||
@end_toggle
|
||||
|
||||
We used the following images: [LinuxLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/LinuxLogo.jpg) and [WindowsLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/WindowsLogo.jpg)
|
||||
|
||||
@warning Since we are *adding* *src1* and *src2*, they both have to be of the same size
|
||||
(width and height) and type.
|
||||
|
||||
Now we need to generate the `g(x)` image. For this, the function **addWeighted()** comes quite handy:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp blend_images
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/AddingImages/AddingImages.java blend_images
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/AddingImages/adding_images.py blend_images
|
||||
Numpy version of above line (but cv function is around 2x faster):
|
||||
\code{.py}
|
||||
dst = np.uint8(alpha*(img1)+beta*(img2))
|
||||
\endcode
|
||||
@end_toggle
|
||||
|
||||
since **addWeighted()** produces:
|
||||
\f[dst = \alpha \cdot src1 + \beta \cdot src2 + \gamma\f]
|
||||
In this case, `gamma` is the argument \f$0.0\f$ in the code above.
|
||||
|
||||
Create windows, show the images and wait for the user to end the program.
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/AddingImages/AddingImages.java display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/AddingImages/adding_images.py display
|
||||
@end_toggle
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||

|
||||
|
After Width: | Height: | Size: 6.4 KiB |
@@ -0,0 +1,325 @@
|
||||
Changing the contrast and brightness of an image! {#tutorial_basic_linear_transform}
|
||||
=================================================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_adding_images}
|
||||
@next_tutorial{tutorial_discrete_fourier_transform}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Ana Huamán |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this tutorial you will learn how to:
|
||||
|
||||
- Access pixel values
|
||||
- Initialize a matrix with zeros
|
||||
- Learn what @ref cv::saturate_cast does and why it is useful
|
||||
- Get some cool info about pixel transformations
|
||||
- Improve the brightness of an image on a practical example
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
@note
|
||||
The explanation below belongs to the book [Computer Vision: Algorithms and
|
||||
Applications](http://szeliski.org/Book/) by Richard Szeliski
|
||||
|
||||
### Image Processing
|
||||
|
||||
- A general image processing operator is a function that takes one or more input images and
|
||||
produces an output image.
|
||||
- Image transforms can be seen as:
|
||||
- Point operators (pixel transforms)
|
||||
- Neighborhood (area-based) operators
|
||||
|
||||
### Pixel Transforms
|
||||
|
||||
- In this kind of image processing transform, each output pixel's value depends on only the
|
||||
corresponding input pixel value (plus, potentially, some globally collected information or
|
||||
parameters).
|
||||
- Examples of such operators include *brightness and contrast adjustments* as well as color
|
||||
correction and transformations.
|
||||
|
||||
### Brightness and contrast adjustments
|
||||
|
||||
- Two commonly used point processes are *multiplication* and *addition* with a constant:
|
||||
|
||||
\f[g(x) = \alpha f(x) + \beta\f]
|
||||
|
||||
- The parameters \f$\alpha > 0\f$ and \f$\beta\f$ are often called the *gain* and *bias* parameters;
|
||||
sometimes these parameters are said to control *contrast* and *brightness* respectively.
|
||||
- You can think of \f$f(x)\f$ as the source image pixels and \f$g(x)\f$ as the output image pixels. Then,
|
||||
more conveniently we can write the expression as:
|
||||
|
||||
\f[g(i,j) = \alpha \cdot f(i,j) + \beta\f]
|
||||
|
||||
where \f$i\f$ and \f$j\f$ indicates that the pixel is located in the *i-th* row and *j-th* column.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
@add_toggle_cpp
|
||||
- **Downloadable code**: Click
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp)
|
||||
|
||||
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
|
||||
@include samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
- **Downloadable code**: Click
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java)
|
||||
|
||||
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
|
||||
@include samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
- **Downloadable code**: Click
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py)
|
||||
|
||||
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
|
||||
@include samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
- We load an image using @ref cv::imread and save it in a Mat object:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-load
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-load
|
||||
@end_toggle
|
||||
|
||||
- Now, since we will make some transformations to this image, we need a new Mat object to store
|
||||
it. Also, we want this to have the following features:
|
||||
|
||||
- Initial pixel values equal to zero
|
||||
- Same size and type as the original image
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-output
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-output
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-output
|
||||
@end_toggle
|
||||
|
||||
We observe that @ref cv::Mat::zeros returns a Matlab-style zero initializer based on
|
||||
*image.size()* and *image.type()*
|
||||
|
||||
- We ask now the values of \f$\alpha\f$ and \f$\beta\f$ to be entered by the user:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-parameters
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-parameters
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-parameters
|
||||
@end_toggle
|
||||
|
||||
- Now, to perform the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ we will access to each
|
||||
pixel in image. Since we are operating with BGR images, we will have three values per pixel (B,
|
||||
G and R), so we will also access them separately. Here is the piece of code:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-operation
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-operation
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-operation
|
||||
@end_toggle
|
||||
|
||||
Notice the following (**C++ code only**):
|
||||
- To access each pixel in the images we are using this syntax: *image.at\<Vec3b\>(y,x)[c]*
|
||||
where *y* is the row, *x* is the column and *c* is B, G or R (0, 1 or 2).
|
||||
- Since the operation \f$\alpha \cdot p(i,j) + \beta\f$ can give values out of range or not
|
||||
integers (if \f$\alpha\f$ is float), we use cv::saturate_cast to make sure the
|
||||
values are valid.
|
||||
|
||||
- Finally, we create windows and show the images, the usual way.
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-display
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-display
|
||||
@end_toggle
|
||||
|
||||
@note
|
||||
Instead of using the **for** loops to access each pixel, we could have simply used this command:
|
||||
|
||||
@add_toggle_cpp
|
||||
@code{.cpp}
|
||||
image.convertTo(new_image, -1, alpha, beta);
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@code{.java}
|
||||
image.convertTo(newImage, -1, alpha, beta);
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@code{.py}
|
||||
new_image = cv.convertScaleAbs(image, alpha=alpha, beta=beta)
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
where @ref cv::Mat::convertTo would effectively perform *new_image = a*image + beta\*. However, we
|
||||
wanted to show you how to access each pixel. In any case, both methods give the same result but
|
||||
convertTo is more optimized and works a lot faster.
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
- Running our code and using \f$\alpha = 2.2\f$ and \f$\beta = 50\f$
|
||||
@code{.bash}
|
||||
$ ./BasicLinearTransforms lena.jpg
|
||||
Basic Linear Transforms
|
||||
-------------------------
|
||||
* Enter the alpha value [1.0-3.0]: 2.2
|
||||
* Enter the beta value [0-100]: 50
|
||||
@endcode
|
||||
|
||||
- We get this:
|
||||
|
||||

|
||||
|
||||
Practical example
|
||||
----
|
||||
|
||||
In this paragraph, we will put into practice what we have learned to correct an underexposed image by adjusting the brightness
|
||||
and the contrast of the image. We will also see another technique to correct the brightness of an image called
|
||||
gamma correction.
|
||||
|
||||
### Brightness and contrast adjustments
|
||||
|
||||
Increasing (/ decreasing) the \f$\beta\f$ value will add (/ subtract) a constant value to every pixel. Pixel values outside of the [0 ; 255]
|
||||
range will be saturated (i.e. a pixel value higher (/ lesser) than 255 (/ 0) will be clamped to 255 (/ 0)).
|
||||
|
||||

|
||||
|
||||
The histogram represents for each color level the number of pixels with that color level. A dark image will have many pixels with
|
||||
low color value and thus the histogram will present a peak in its left part. When adding a constant bias, the histogram is shifted to the
|
||||
right as we have added a constant bias to all the pixels.
|
||||
|
||||
The \f$\alpha\f$ parameter will modify how the levels spread. If \f$ \alpha < 1 \f$, the color levels will be compressed and the result
|
||||
will be an image with less contrast.
|
||||
|
||||

|
||||
|
||||
Note that these histograms have been obtained using the Brightness-Contrast tool in the Gimp software. The brightness tool should be
|
||||
identical to the \f$\beta\f$ bias parameters but the contrast tool seems to differ to the \f$\alpha\f$ gain where the output range
|
||||
seems to be centered with Gimp (as you can notice in the previous histogram).
|
||||
|
||||
It can occur that playing with the \f$\beta\f$ bias will improve the brightness but in the same time the image will appear with a
|
||||
slight veil as the contrast will be reduced. The \f$\alpha\f$ gain can be used to diminue this effect but due to the saturation,
|
||||
we will lose some details in the original bright regions.
|
||||
|
||||
### Gamma correction
|
||||
|
||||
[Gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) can be used to correct the brightness of an image by using a non
|
||||
linear transformation between the input values and the mapped output values:
|
||||
|
||||
\f[O = \left( \frac{I}{255} \right)^{\gamma} \times 255\f]
|
||||
|
||||
As this relation is non linear, the effect will not be the same for all the pixels and will depend to their original value.
|
||||
|
||||

|
||||
|
||||
When \f$ \gamma < 1 \f$, the original dark regions will be brighter and the histogram will be shifted to the right whereas it will
|
||||
be the opposite with \f$ \gamma > 1 \f$.
|
||||
|
||||
### Correct an underexposed image
|
||||
|
||||
The following image has been corrected with: \f$ \alpha = 1.3 \f$ and \f$ \beta = 40 \f$.
|
||||
|
||||
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_linear_transform_correction.jpg)
|
||||
|
||||
The overall brightness has been improved but you can notice that the clouds are now greatly saturated due to the numerical saturation
|
||||
of the implementation used ([highlight clipping](https://en.wikipedia.org/wiki/Clipping_(photography)) in photography).
|
||||
|
||||
The following image has been corrected with: \f$ \gamma = 0.4 \f$.
|
||||
|
||||
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_gamma_correction.jpg)
|
||||
|
||||
The gamma correction should tend to add less saturation effect as the mapping is non linear and there is no numerical saturation possible as in the previous method.
|
||||
|
||||

|
||||
|
||||
The previous figure compares the histograms for the three images (the y-ranges are not the same between the three histograms).
|
||||
You can notice that most of the pixel values are in the lower part of the histogram for the original image. After \f$ \alpha \f$,
|
||||
\f$ \beta \f$ correction, we can observe a big peak at 255 due to the saturation as well as a shift in the right.
|
||||
After gamma correction, the histogram is shifted to the right but the pixels in the dark regions are more shifted
|
||||
(see the gamma curves [figure](Basic_Linear_Transform_Tutorial_gamma.png)) than those in the bright regions.
|
||||
|
||||
In this tutorial, you have seen two simple methods to adjust the contrast and the brightness of an image. **They are basic techniques
|
||||
and are not intended to be used as a replacement of a raster graphics editor!**
|
||||
|
||||
### Code
|
||||
|
||||
@add_toggle_cpp
|
||||
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp).
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java).
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py).
|
||||
@end_toggle
|
||||
|
||||
Code for the gamma correction:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp changing-contrast-brightness-gamma-correction
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java changing-contrast-brightness-gamma-correction
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py changing-contrast-brightness-gamma-correction
|
||||
@end_toggle
|
||||
|
||||
A look-up table is used to improve the performance of the computation as only 256 values needs to be calculated once.
|
||||
|
||||
### Additional resources
|
||||
|
||||
- [Gamma correction in graphics rendering](https://learnopengl.com/#!Advanced-Lighting/Gamma-Correction)
|
||||
- [Gamma correction and images displayed on CRT monitors](http://www.graphics.cornell.edu/~westin/gamma/gamma.html)
|
||||
- [Digital exposure techniques](http://www.cambridgeincolour.com/tutorials/digital-exposure-techniques.htm)
|
||||
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 90 KiB |
|
After Width: | Height: | Size: 270 KiB |
|
After Width: | Height: | Size: 3.1 KiB |
|
After Width: | Height: | Size: 3.4 KiB |
|
After Width: | Height: | Size: 1.4 KiB |
|
After Width: | Height: | Size: 222 KiB |
@@ -0,0 +1,245 @@
|
||||
Discrete Fourier Transform {#tutorial_discrete_fourier_transform}
|
||||
==========================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_basic_linear_transform}
|
||||
@next_tutorial{tutorial_file_input_output_with_xml_yml}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Bernát Gábor |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
We'll seek answers for the following questions:
|
||||
|
||||
- What is a Fourier transform and why use it?
|
||||
- How to do it in OpenCV?
|
||||
- Usage of functions such as: **copyMakeBorder()** , **merge()** , **dft()** ,
|
||||
**getOptimalDFTSize()** , **log()** and **normalize()** .
|
||||
|
||||
Source code
|
||||
-----------
|
||||
|
||||
@add_toggle_cpp
|
||||
You can [download this from here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp) or
|
||||
find it in the
|
||||
`samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp` of the
|
||||
OpenCV source code library.
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
You can [download this from here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java) or
|
||||
find it in the
|
||||
`samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java` of the
|
||||
OpenCV source code library.
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
You can [download this from here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py) or
|
||||
find it in the
|
||||
`samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py` of the
|
||||
OpenCV source code library.
|
||||
@end_toggle
|
||||
|
||||
Here's a sample usage of **dft()** :
|
||||
|
||||
@add_toggle_cpp
|
||||
@include cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@include java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@include python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
The Fourier Transform will decompose an image into its sinus and cosines components. In other words,
|
||||
it will transform an image from its spatial domain to its frequency domain. The idea is that any
|
||||
function may be approximated exactly with the sum of infinite sinus and cosines functions. The
|
||||
Fourier Transform is a way how to do this. Mathematically a two dimensional images Fourier transform
|
||||
is:
|
||||
|
||||
\f[F(k,l) = \displaystyle\sum\limits_{i=0}^{N-1}\sum\limits_{j=0}^{N-1} f(i,j)e^{-i2\pi(\frac{ki}{N}+\frac{lj}{N})}\f]\f[e^{ix} = \cos{x} + i\sin {x}\f]
|
||||
|
||||
Here f is the image value in its spatial domain and F in its frequency domain. The result of the
|
||||
transformation is complex numbers. Displaying this is possible either via a *real* image and a
|
||||
*complex* image or via a *magnitude* and a *phase* image. However, throughout the image processing
|
||||
algorithms only the *magnitude* image is interesting as this contains all the information we need
|
||||
about the images geometric structure. Nevertheless, if you intend to make some modifications of the
|
||||
image in these forms and then you need to retransform it you'll need to preserve both of these.
|
||||
|
||||
In this sample I'll show how to calculate and show the *magnitude* image of a Fourier Transform. In
|
||||
case of digital images are discrete. This means they may take up a value from a given domain value.
|
||||
For example in a basic gray scale image values usually are between zero and 255. Therefore the
|
||||
Fourier Transform too needs to be of a discrete type resulting in a Discrete Fourier Transform
|
||||
(*DFT*). You'll want to use this whenever you need to determine the structure of an image from a
|
||||
geometrical point of view. Here are the steps to follow (in case of a gray scale input image *I*):
|
||||
|
||||
#### Expand the image to an optimal size
|
||||
|
||||
The performance of a DFT is dependent of the image
|
||||
size. It tends to be the fastest for image sizes that are multiple of the numbers two, three and
|
||||
five. Therefore, to achieve maximal performance it is generally a good idea to pad border values
|
||||
to the image to get a size with such traits. The **getOptimalDFTSize()** returns this
|
||||
optimal size and we can use the **copyMakeBorder()** function to expand the borders of an
|
||||
image (the appended pixels are initialized with zero):
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp expand
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java expand
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py expand
|
||||
@end_toggle
|
||||
|
||||
#### Make place for both the complex and the real values
|
||||
|
||||
The result of a Fourier Transform is
|
||||
complex. This implies that for each image value the result is two image values (one per
|
||||
component). Moreover, the frequency domains range is much larger than its spatial counterpart.
|
||||
Therefore, we store these usually at least in a *float* format. Therefore we'll convert our
|
||||
input image to this type and expand it with another channel to hold the complex values:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp complex_and_real
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java complex_and_real
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py complex_and_real
|
||||
@end_toggle
|
||||
|
||||
#### Make the Discrete Fourier Transform
|
||||
It's possible an in-place calculation (same input as
|
||||
output):
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp dft
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java dft
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py dft
|
||||
@end_toggle
|
||||
|
||||
#### Transform the real and complex values to magnitude
|
||||
A complex number has a real (*Re*) and a
|
||||
complex (imaginary - *Im*) part. The results of a DFT are complex numbers. The magnitude of a
|
||||
DFT is:
|
||||
|
||||
\f[M = \sqrt[2]{ {Re(DFT(I))}^2 + {Im(DFT(I))}^2}\f]
|
||||
|
||||
Translated to OpenCV code:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp magnitude
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java magnitude
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py magnitude
|
||||
@end_toggle
|
||||
|
||||
#### Switch to a logarithmic scale
|
||||
It turns out that the dynamic range of the Fourier
|
||||
coefficients is too large to be displayed on the screen. We have some small and some high
|
||||
changing values that we can't observe like this. Therefore the high values will all turn out as
|
||||
white points, while the small ones as black. To use the gray scale values to for visualization
|
||||
we can transform our linear scale to a logarithmic one:
|
||||
|
||||
\f[M_1 = \log{(1 + M)}\f]
|
||||
|
||||
Translated to OpenCV code:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp log
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java log
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py log
|
||||
@end_toggle
|
||||
|
||||
#### Crop and rearrange
|
||||
Remember, that at the first step, we expanded the image? Well, it's time
|
||||
to throw away the newly introduced values. For visualization purposes we may also rearrange the
|
||||
quadrants of the result, so that the origin (zero, zero) corresponds with the image center.
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp crop_rearrange
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java crop_rearrange
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py crop_rearrange
|
||||
@end_toggle
|
||||
|
||||
#### Normalize
|
||||
This is done again for visualization purposes. We now have the magnitudes,
|
||||
however this are still out of our image display range of zero to one. We normalize our values to
|
||||
this range using the @ref cv::normalize() function.
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp normalize
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java normalize
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py normalize
|
||||
@end_toggle
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
An application idea would be to determine the geometrical orientation present in the image. For
|
||||
example, let us find out if a text is horizontal or not? Looking at some text you'll notice that the
|
||||
text lines sort of form also horizontal lines and the letters form sort of vertical lines. These two
|
||||
main components of a text snippet may be also seen in case of the Fourier transform. Let us use
|
||||
[this horizontal ](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextN.png) and [this rotated](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextR.png)
|
||||
image about a text.
|
||||
|
||||
In case of the horizontal text:
|
||||
|
||||

|
||||
|
||||
In case of a rotated text:
|
||||
|
||||

|
||||
|
||||
You can see that the most influential components of the frequency domain (brightest dots on the
|
||||
magnitude image) follow the geometric rotation of objects on the image. From this we may calculate
|
||||
the offset and perform an image rotation to correct eventual miss alignments.
|
||||
|
After Width: | Height: | Size: 11 KiB |
|
After Width: | Height: | Size: 12 KiB |
@@ -0,0 +1,296 @@
|
||||
File Input and Output using XML and YAML files {#tutorial_file_input_output_with_xml_yml}
|
||||
==============================================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_discrete_fourier_transform}
|
||||
@next_tutorial{tutorial_how_to_use_OpenCV_parallel_for_}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Bernát Gábor |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
You'll find answers for the following questions:
|
||||
|
||||
- How to print and read text entries to a file and OpenCV using YAML or XML files?
|
||||
- How to do the same for OpenCV data structures?
|
||||
- How to do this for your data structures?
|
||||
- Usage of OpenCV data structures such as @ref cv::FileStorage , @ref cv::FileNode or @ref
|
||||
cv::FileNodeIterator .
|
||||
|
||||
Source code
|
||||
-----------
|
||||
@add_toggle_cpp
|
||||
You can [download this from here
|
||||
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp) or find it in the
|
||||
`samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp` of the OpenCV source code
|
||||
library.
|
||||
|
||||
Here's a sample code of how to achieve all the stuff enumerated at the goal list.
|
||||
|
||||
@include cpp/tutorial_code/core/file_input_output/file_input_output.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
You can [download this from here
|
||||
](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/core/file_input_output/file_input_output.py) or find it in the
|
||||
`samples/python/tutorial_code/core/file_input_output/file_input_output.py` of the OpenCV source code
|
||||
library.
|
||||
|
||||
Here's a sample code of how to achieve all the stuff enumerated at the goal list.
|
||||
|
||||
@include python/tutorial_code/core/file_input_output/file_input_output.py
|
||||
@end_toggle
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Here we talk only about XML and YAML file inputs. Your output (and its respective input) file may
|
||||
have only one of these extensions and the structure coming from this. They are two kinds of data
|
||||
structures you may serialize: *mappings* (like the STL map and the Python dictionary) and *element sequence* (like the STL
|
||||
vector). The difference between these is that in a map every element has a unique name through what
|
||||
you may access it. For sequences you need to go through them to query a specific item.
|
||||
|
||||
-# **XML/YAML File Open and Close.** Before you write any content to such file you need to open it
|
||||
and at the end to close it. The XML/YAML data structure in OpenCV is @ref cv::FileStorage . To
|
||||
specify that this structure to which file binds on your hard drive you can use either its
|
||||
constructor or the *open()* function of this:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp open
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py open
|
||||
@end_toggle
|
||||
Either one of this you use the second argument is a constant specifying the type of operations
|
||||
you'll be able to on them: WRITE, READ or APPEND. The extension specified in the file name also
|
||||
determinates the output format that will be used. The output may be even compressed if you
|
||||
specify an extension such as *.xml.gz*.
|
||||
|
||||
The file automatically closes when the @ref cv::FileStorage objects is destroyed. However, you
|
||||
may explicitly call for this by using the *release* function:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp close
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py close
|
||||
@end_toggle
|
||||
-# **Input and Output of text and numbers.** In C++, the data structure uses the \<\< output
|
||||
operator in the STL library. In Python, @ref cv::FileStorage.write() is used instead. For
|
||||
outputting any type of data structure we need first to specify its name. We do this by just
|
||||
simply pushing the name of this to the stream in C++. In Python, the first parameter for the
|
||||
write function is the name. For basic types you may follow this with the print of the value :
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeNum
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeNum
|
||||
@end_toggle
|
||||
Reading in is a simple addressing (via the [] operator) and casting operation or a read via
|
||||
the \>\> operator. In Python, we address with getNode() and use real() :
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readNum
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readNum
|
||||
@end_toggle
|
||||
-# **Input/Output of OpenCV Data structures.** Well these behave exactly just as the basic C++
|
||||
and Python types:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp iomati
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp iomatw
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp iomat
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py iomati
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py iomatw
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py iomat
|
||||
@end_toggle
|
||||
-# **Input/Output of vectors (arrays) and associative maps.** As I mentioned beforehand, we can
|
||||
output maps and sequences (array, vector) too. Again we first print the name of the variable and
|
||||
then we have to specify if our output is either a sequence or map.
|
||||
|
||||
For sequence before the first element print the "[" character and after the last one the "]"
|
||||
character. With Python, call `FileStorage.startWriteStruct(structure_name, struct_type)`,
|
||||
where `struct_type` is `cv2.FileNode_MAP` or `cv2.FileNode_SEQ` to start writing the structure.
|
||||
Call `FileStorage.endWriteStruct()` to finish the structure:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeStr
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeStr
|
||||
@end_toggle
|
||||
For maps the drill is the same however now we use the "{" and "}" delimiter characters:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeMap
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeMap
|
||||
@end_toggle
|
||||
To read from these we use the @ref cv::FileNode and the @ref cv::FileNodeIterator data
|
||||
structures. The [] operator of the @ref cv::FileStorage class (or the getNode() function in Python) returns a @ref cv::FileNode data
|
||||
type. If the node is sequential we can use the @ref cv::FileNodeIterator to iterate through the
|
||||
items. In Python, the at() function can be used to address elements of the sequence and the
|
||||
size() function returns the length of the sequence:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readStr
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py readStr
|
||||
@end_toggle
|
||||
For maps you can use the [] operator (at() function in Python) again to access the given item (or the \>\> operator too):
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readMap
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py readMap
|
||||
@end_toggle
|
||||
-# **Read and write your own data structures.** Suppose you have a data structure such as:
|
||||
@add_toggle_cpp
|
||||
@code{.cpp}
|
||||
class MyData
|
||||
{
|
||||
public:
|
||||
MyData() : A(0), X(0), id() {}
|
||||
public: // Data Members
|
||||
int A;
|
||||
double X;
|
||||
string id;
|
||||
};
|
||||
@endcode
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@code{.py}
|
||||
class MyData:
|
||||
def __init__(self):
|
||||
self.A = self.X = 0
|
||||
self.name = ''
|
||||
@endcode
|
||||
@end_toggle
|
||||
In C++, it's possible to serialize this through the OpenCV I/O XML/YAML interface (just as
|
||||
in case of the OpenCV data structures) by adding a read and a write function inside and outside of your
|
||||
class. In Python, you can get close to this by implementing a read and write function inside
|
||||
the class. For the inside part:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp inside
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py inside
|
||||
@end_toggle
|
||||
@add_toggle_cpp
|
||||
In C++, you need to add the following functions definitions outside the class:
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp outside
|
||||
@end_toggle
|
||||
Here you can observe that in the read section we defined what happens if the user tries to read
|
||||
a non-existing node. In this case we just return the default initialization value, however a
|
||||
more verbose solution would be to return for instance a minus one value for an object ID.
|
||||
|
||||
Once you added these four functions use the \>\> operator for write and the \<\< operator for
|
||||
read (or the defined input/output functions for Python):
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIOi
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIOw
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIO
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIOi
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIOw
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIO
|
||||
@end_toggle
|
||||
Or to try out reading a non-existing read:
|
||||
@add_toggle_cpp
|
||||
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp nonexist
|
||||
@end_toggle
|
||||
@add_toggle_python
|
||||
@snippet python/tutorial_code/core/file_input_output/file_input_output.py nonexist
|
||||
@end_toggle
|
||||
|
||||
Result
|
||||
------
|
||||
|
||||
Well mostly we just print out the defined numbers. On the screen of your console you could see:
|
||||
@code{.bash}
|
||||
Write Done.
|
||||
|
||||
Reading:
|
||||
100image1.jpg
|
||||
Awesomeness
|
||||
baboon.jpg
|
||||
Two 2; One 1
|
||||
|
||||
|
||||
R = [1, 0, 0;
|
||||
0, 1, 0;
|
||||
0, 0, 1]
|
||||
T = [0; 0; 0]
|
||||
|
||||
MyData =
|
||||
{ id = mydata1234, X = 3.14159, A = 97}
|
||||
|
||||
Attempt to read NonExisting (should initialize the data structure with its default).
|
||||
NonExisting =
|
||||
{ id = , X = 0, A = 0}
|
||||
|
||||
Tip: Open up output.xml with a text editor to see the serialized data.
|
||||
@endcode
|
||||
Nevertheless, it's much more interesting what you may see in the output xml file:
|
||||
@code{.xml}
|
||||
<?xml version="1.0"?>
|
||||
<opencv_storage>
|
||||
<iterationNr>100</iterationNr>
|
||||
<strings>
|
||||
image1.jpg Awesomeness baboon.jpg</strings>
|
||||
<Mapping>
|
||||
<One>1</One>
|
||||
<Two>2</Two></Mapping>
|
||||
<R type_id="opencv-matrix">
|
||||
<rows>3</rows>
|
||||
<cols>3</cols>
|
||||
<dt>u</dt>
|
||||
<data>
|
||||
1 0 0 0 1 0 0 0 1</data></R>
|
||||
<T type_id="opencv-matrix">
|
||||
<rows>3</rows>
|
||||
<cols>1</cols>
|
||||
<dt>d</dt>
|
||||
<data>
|
||||
0. 0. 0.</data></T>
|
||||
<MyData>
|
||||
<A>97</A>
|
||||
<X>3.1415926535897931e+000</X>
|
||||
<id>mydata1234</id></MyData>
|
||||
</opencv_storage>
|
||||
@endcode
|
||||
Or the YAML file:
|
||||
@code{.yaml}
|
||||
%YAML:1.0
|
||||
iterationNr: 100
|
||||
strings:
|
||||
- "image1.jpg"
|
||||
- Awesomeness
|
||||
- "baboon.jpg"
|
||||
Mapping:
|
||||
One: 1
|
||||
Two: 2
|
||||
R: !!opencv-matrix
|
||||
rows: 3
|
||||
cols: 3
|
||||
dt: u
|
||||
data: [ 1, 0, 0, 0, 1, 0, 0, 0, 1 ]
|
||||
T: !!opencv-matrix
|
||||
rows: 3
|
||||
cols: 1
|
||||
dt: d
|
||||
data: [ 0., 0., 0. ]
|
||||
MyData:
|
||||
A: 97
|
||||
X: 3.1415926535897931e+000
|
||||
id: mydata1234
|
||||
@endcode
|
||||
You may observe a runtime instance of this on the [YouTube
|
||||
here](https://www.youtube.com/watch?v=A4yqVnByMMM) .
|
||||
|
||||
@youtube{A4yqVnByMMM}
|
||||
@@ -0,0 +1,227 @@
|
||||
How to scan images, lookup tables and time measurement with OpenCV {#tutorial_how_to_scan_images}
|
||||
==================================================================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_mat_the_basic_image_container}
|
||||
@next_tutorial{tutorial_mat_mask_operations}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Bernát Gábor |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
We'll seek answers for the following questions:
|
||||
|
||||
- How to go through each and every pixel of an image?
|
||||
- How are OpenCV matrix values stored?
|
||||
- How to measure the performance of our algorithm?
|
||||
- What are lookup tables and why use them?
|
||||
|
||||
Our test case
|
||||
-------------
|
||||
|
||||
Let us consider a simple color reduction method. By using the unsigned char C and C++ type for
|
||||
matrix item storing, a channel of pixel may have up to 256 different values. For a three channel
|
||||
image this can allow the formation of way too many colors (16 million to be exact). Working with so
|
||||
many color shades may give a heavy blow to our algorithm performance. However, sometimes it is
|
||||
enough to work with a lot less of them to get the same final result.
|
||||
|
||||
In this cases it's common that we make a *color space reduction*. This means that we divide the
|
||||
color space current value with a new input value to end up with fewer colors. For instance every
|
||||
value between zero and nine takes the new value zero, every value between ten and nineteen the value
|
||||
ten and so on.
|
||||
|
||||
When you divide an *uchar* (unsigned char - aka values between zero and 255) value with an *int*
|
||||
value the result will be also *char*. These values may only be char values. Therefore, any fraction
|
||||
will be rounded down. Taking advantage of this fact the upper operation in the *uchar* domain may be
|
||||
expressed as:
|
||||
|
||||
\f[I_{new} = (\frac{I_{old}}{10}) * 10\f]
|
||||
|
||||
A simple color space reduction algorithm would consist of just passing through every pixel of an
|
||||
image matrix and applying this formula. It's worth noting that we do a divide and a multiplication
|
||||
operation. These operations are bloody expensive for a system. If possible it's worth avoiding them
|
||||
by using cheaper operations such as a few subtractions, addition or in best case a simple
|
||||
assignment. Furthermore, note that we only have a limited number of input values for the upper
|
||||
operation. In case of the *uchar* system this is 256 to be exact.
|
||||
|
||||
Therefore, for larger images it would be wise to calculate all possible values beforehand and during
|
||||
the assignment just make the assignment, by using a lookup table. Lookup tables are simple arrays
|
||||
(having one or more dimensions) that for a given input value variation holds the final output value.
|
||||
Its strength is that we do not need to make the calculation, we just need to read the result.
|
||||
|
||||
Our test case program (and the code sample below) will do the following: read in an image passed
|
||||
as a command line argument (it may be either color or grayscale) and apply the reduction
|
||||
with the given command line argument integer value. In OpenCV, at the moment there are
|
||||
three major ways of going through an image pixel by pixel. To make things a little more interesting
|
||||
we'll make the scanning of the image using each of these methods, and print out how long it took.
|
||||
|
||||
You can download the full source code [here
|
||||
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/how_to_scan_images/how_to_scan_images.cpp) or look it up in
|
||||
the samples directory of OpenCV at the cpp tutorial code for the core section. Its basic usage is:
|
||||
@code{.bash}
|
||||
how_to_scan_images imageName.jpg intValueToReduce [G]
|
||||
@endcode
|
||||
The final argument is optional. If given the image will be loaded in grayscale format, otherwise
|
||||
the BGR color space is used. The first thing is to calculate the lookup table.
|
||||
|
||||
@snippet how_to_scan_images.cpp dividewith
|
||||
|
||||
Here we first use the C++ *stringstream* class to convert the third command line argument from text
|
||||
to an integer format. Then we use a simple look and the upper formula to calculate the lookup table.
|
||||
No OpenCV specific stuff here.
|
||||
|
||||
Another issue is how do we measure time? Well OpenCV offers two simple functions to achieve this
|
||||
cv::getTickCount() and cv::getTickFrequency() . The first returns the number of ticks of
|
||||
your systems CPU from a certain event (like since you booted your system). The second returns how
|
||||
many times your CPU emits a tick during a second. So, measuring amount of time elapsed between
|
||||
two operations is as easy as:
|
||||
@code{.cpp}
|
||||
double t = (double)getTickCount();
|
||||
// do something ...
|
||||
t = ((double)getTickCount() - t)/getTickFrequency();
|
||||
cout << "Times passed in seconds: " << t << endl;
|
||||
@endcode
|
||||
|
||||
@anchor tutorial_how_to_scan_images_storing
|
||||
How is the image matrix stored in memory?
|
||||
-----------------------------------------
|
||||
|
||||
As you could already read in my @ref tutorial_mat_the_basic_image_container tutorial the size of the matrix
|
||||
depends on the color system used. More accurately, it depends on the number of channels used. In
|
||||
case of a grayscale image we have something like:
|
||||
|
||||

|
||||
|
||||
For multichannel images the columns contain as many sub columns as the number of channels. For
|
||||
example in case of an BGR color system:
|
||||
|
||||

|
||||
|
||||
Note that the order of the channels is inverse: BGR instead of RGB. Because in many cases the memory
|
||||
is large enough to store the rows in a successive fashion the rows may follow one after another,
|
||||
creating a single long row. Because everything is in a single place following one after another this
|
||||
may help to speed up the scanning process. We can use the cv::Mat::isContinuous() function to *ask*
|
||||
the matrix if this is the case. Continue on to the next section to find an example.
|
||||
|
||||
The efficient way
|
||||
-----------------
|
||||
|
||||
When it comes to performance you cannot beat the classic C style operator[] (pointer) access.
|
||||
Therefore, the most efficient method we can recommend for making the assignment is:
|
||||
|
||||
@snippet how_to_scan_images.cpp scan-c
|
||||
|
||||
Here we basically just acquire a pointer to the start of each row and go through it until it ends.
|
||||
In the special case that the matrix is stored in a continuous manner we only need to request the
|
||||
pointer a single time and go all the way to the end. We need to look out for color images: we have
|
||||
three channels so we need to pass through three times more items in each row.
|
||||
|
||||
There's another way of this. The *data* data member of a *Mat* object returns the pointer to the
|
||||
first row, first column. If this pointer is null you have no valid input in that object. Checking
|
||||
this is the simplest method to check if your image loading was a success. In case the storage is
|
||||
continuous we can use this to go through the whole data pointer. In case of a grayscale image this
|
||||
would look like:
|
||||
@code{.cpp}
|
||||
uchar* p = I.data;
|
||||
|
||||
for( unsigned int i = 0; i < ncol*nrows; ++i)
|
||||
*p++ = table[*p];
|
||||
@endcode
|
||||
You would get the same result. However, this code is a lot harder to read later on. It gets even
|
||||
harder if you have some more advanced technique there. Moreover, in practice I've observed you'll
|
||||
get the same performance result (as most of the modern compilers will probably make this small
|
||||
optimization trick automatically for you).
|
||||
|
||||
The iterator (safe) method
|
||||
--------------------------
|
||||
|
||||
In case of the efficient way making sure that you pass through the right amount of *uchar* fields
|
||||
and to skip the gaps that may occur between the rows was your responsibility. The iterator method is
|
||||
considered a safer way as it takes over these tasks from the user. All you need to do is to ask the
|
||||
begin and the end of the image matrix and then just increase the begin iterator until you reach the
|
||||
end. To acquire the value *pointed* by the iterator use the \* operator (add it before it).
|
||||
|
||||
@snippet how_to_scan_images.cpp scan-iterator
|
||||
|
||||
In case of color images we have three uchar items per column. This may be considered a short vector
|
||||
of uchar items, that has been baptized in OpenCV with the *Vec3b* name. To access the n-th sub
|
||||
column we use simple operator[] access. It's important to remember that OpenCV iterators go through
|
||||
the columns and automatically skip to the next row. Therefore in case of color images if you use a
|
||||
simple *uchar* iterator you'll be able to access only the blue channel values.
|
||||
|
||||
On-the-fly address calculation with reference returning
|
||||
-------------------------------------------------------
|
||||
|
||||
The final method isn't recommended for scanning. It was made to acquire or modify somehow random
|
||||
elements in the image. Its basic usage is to specify the row and column number of the item you want
|
||||
to access. During our earlier scanning methods you could already notice that it is important through
|
||||
what type we are looking at the image. It's no different here as you need to manually specify what
|
||||
type to use at the automatic lookup. You can observe this in case of the grayscale images for the
|
||||
following source code (the usage of the + cv::Mat::at() function):
|
||||
|
||||
@snippet how_to_scan_images.cpp scan-random
|
||||
|
||||
The function takes your input type and coordinates and calculates the address of the
|
||||
queried item. Then returns a reference to that. This may be a constant when you *get* the value and
|
||||
non-constant when you *set* the value. As a safety step in **debug mode only**\* there is a check
|
||||
performed that your input coordinates are valid and do exist. If this isn't the case you'll get a
|
||||
nice output message of this on the standard error output stream. Compared to the efficient way in
|
||||
release mode the only difference in using this is that for every element of the image you'll get a
|
||||
new row pointer for what we use the C operator[] to acquire the column element.
|
||||
|
||||
If you need to do multiple lookups using this method for an image it may be troublesome and time
|
||||
consuming to enter the type and the at keyword for each of the accesses. To solve this problem
|
||||
OpenCV has a cv::Mat_ data type. It's the same as Mat with the extra need that at definition
|
||||
you need to specify the data type through what to look at the data matrix, however in return you can
|
||||
use the operator() for fast access of items. To make things even better this is easily convertible
|
||||
from and to the usual cv::Mat data type. A sample usage of this you can see in case of the
|
||||
color images of the function above. Nevertheless, it's important to note that the same operation
|
||||
(with the same runtime speed) could have been done with the cv::Mat::at function. It's just a less
|
||||
to write for the lazy programmer trick.
|
||||
|
||||
The Core Function
|
||||
-----------------
|
||||
|
||||
This is a bonus method of achieving lookup table modification in an image. In image
|
||||
processing it's quite common that you want to modify all of a given image values to some other value.
|
||||
OpenCV provides a function for modifying image values, without the need to write the scanning logic
|
||||
of the image. We use the cv::LUT() function of the core module. First we build a Mat type of the
|
||||
lookup table:
|
||||
|
||||
@snippet how_to_scan_images.cpp table-init
|
||||
|
||||
Finally call the function (I is our input image and J the output one):
|
||||
|
||||
@snippet how_to_scan_images.cpp table-use
|
||||
|
||||
Performance Difference
|
||||
----------------------
|
||||
|
||||
For the best result compile the program and run it yourself. To make the differences more
|
||||
clear, I've used a quite large (2560 X 1600) image. The performance presented here are for
|
||||
color images. For a more accurate value I've averaged the value I got from the call of the function
|
||||
for hundred times.
|
||||
|
||||
Method | Time
|
||||
--------------- | ----------------------
|
||||
Efficient Way | 79.4717 milliseconds
|
||||
Iterator | 83.7201 milliseconds
|
||||
On-The-Fly RA | 93.7878 milliseconds
|
||||
LUT function | 32.5759 milliseconds
|
||||
|
||||
We can conclude a couple of things. If possible, use the already made functions of OpenCV (instead
|
||||
of reinventing these). The fastest method turns out to be the LUT function. This is because the OpenCV
|
||||
library is multi-thread enabled via Intel Threaded Building Blocks. However, if you need to write a
|
||||
simple image scan prefer the pointer method. The iterator is a safer bet, however quite slower.
|
||||
Using the on-the-fly reference access method for full image scan is the most costly in debug mode.
|
||||
In the release mode it may beat the iterator approach or not, however it surely sacrifices for this
|
||||
the safety trait of iterators.
|
||||
|
||||
Finally, you may watch a sample run of the program on the [video posted](https://www.youtube.com/watch?v=fB3AN5fjgwc) on our YouTube channel.
|
||||
|
||||
@youtube{fB3AN5fjgwc}
|
||||
|
After Width: | Height: | Size: 1.9 KiB |
|
After Width: | Height: | Size: 3.8 KiB |
@@ -0,0 +1,196 @@
|
||||
How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_}
|
||||
==================================================================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_file_input_output_with_xml_yml}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily
|
||||
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set
|
||||
exploiting almost all the CPU load available.
|
||||
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
|
||||
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended
|
||||
to remain simple.
|
||||
|
||||
Precondition
|
||||
----
|
||||
|
||||
The first precondition is to have OpenCV built with a parallel framework.
|
||||
In OpenCV 3.2, the following parallel frameworks are available in that order:
|
||||
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
|
||||
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled)
|
||||
3. OpenMP (integrated to compiler, should be explicitly enabled)
|
||||
4. APPLE GCD (system wide, used automatically (APPLE only))
|
||||
5. Windows RT concurrency (system wide, used automatically (Windows RT only))
|
||||
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
|
||||
7. Pthreads (if available)
|
||||
|
||||
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries
|
||||
are third party libraries and have to be explicitly built and enabled in CMake (e.g. TBB, C=), others are
|
||||
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to
|
||||
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.
|
||||
|
||||
The second (weak) precondition is more related to the task you want to achieve as not all computations
|
||||
are suitable / can be adapted to be run in a parallel way. To remain simple, tasks that can be split
|
||||
into multiple elementary operations with no memory dependency (no possible race condition) are easily
|
||||
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of
|
||||
one pixel does not depend to the state of other pixels.
|
||||
|
||||
Simple example: drawing a Mandelbrot set
|
||||
----
|
||||
|
||||
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt
|
||||
the code to parallelize the computation.
|
||||
|
||||
Theory
|
||||
-----------
|
||||
|
||||
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician
|
||||
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a
|
||||
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a
|
||||
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth
|
||||
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set).
|
||||
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).
|
||||
|
||||
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration
|
||||
> of the quadratic map
|
||||
> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f]
|
||||
> remains bounded.
|
||||
> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying
|
||||
> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets.
|
||||
> This can also be represented as
|
||||
> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f]
|
||||
|
||||
Pseudocode
|
||||
-----------
|
||||
|
||||
A simple algorithm to generate a representation of the Mandelbrot set is called the
|
||||
["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm).
|
||||
For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not
|
||||
under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas
|
||||
we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will
|
||||
produce a more detailed image but the computation time will increase accordingly. We use the number of iterations
|
||||
needed to "escape" to depict the pixel value in the image.
|
||||
|
||||
```
|
||||
For each pixel (Px, Py) on the screen, do:
|
||||
{
|
||||
x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1))
|
||||
y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
|
||||
x = 0.0
|
||||
y = 0.0
|
||||
iteration = 0
|
||||
max_iteration = 1000
|
||||
while (x*x + y*y < 2*2 AND iteration < max_iteration) {
|
||||
xtemp = x*x - y*y + x0
|
||||
y = 2*x*y + y0
|
||||
x = xtemp
|
||||
iteration = iteration + 1
|
||||
}
|
||||
color = palette[iteration]
|
||||
plot(Px, Py, color)
|
||||
}
|
||||
```
|
||||
|
||||
To relate between the pseudocode and the theory, we have:
|
||||
* \f$ z = x + iy \f$
|
||||
* \f$ z^2 = x^2 + i2xy - y^2 \f$
|
||||
* \f$ c = x_0 + iy_0 \f$
|
||||
|
||||

|
||||
|
||||
On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis.
|
||||
You can see that the whole shape can be repeatedly visible if we zoom at particular locations.
|
||||
|
||||
Implementation
|
||||
-----------
|
||||
|
||||
Escape time algorithm implementation
|
||||
--------------------------
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm
|
||||
|
||||
Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a
|
||||
complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration.
|
||||
|
||||
Sequential Mandelbrot implementation
|
||||
--------------------------
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential
|
||||
|
||||
In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the
|
||||
pixel is likely to belong to the Mandelbrot set or not.
|
||||
|
||||
Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with:
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation
|
||||
|
||||
Finally, to assign the grayscale value to the pixels, we use the following rule:
|
||||
* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set),
|
||||
* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range.
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value
|
||||
|
||||
Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost
|
||||
the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his
|
||||
[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)):
|
||||
\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$
|
||||
|
||||

|
||||
|
||||
The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation
|
||||
and you can observe how the lowest values will be boosted when looking at the slope at these positions.
|
||||
|
||||
Parallel Mandelbrot implementation
|
||||
--------------------------
|
||||
|
||||
When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the
|
||||
computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern
|
||||
processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework.
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel
|
||||
|
||||
The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the
|
||||
`virtual void operator ()(const cv::Range& range) const`.
|
||||
|
||||
The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread.
|
||||
This splitting is done automatically to distribute equally the computation load. We have to convert the pixel index coordinate
|
||||
to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place
|
||||
the image.
|
||||
|
||||
The parallel execution is called with:
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call
|
||||
|
||||
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image.
|
||||
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the
|
||||
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)`
|
||||
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the
|
||||
workload only on two threads.
|
||||
|
||||
@note
|
||||
C++ 11 standard allows to simplify the parallel implementation by get rid of the `ParallelMandelbrot` class and replacing it with lambda expression:
|
||||
|
||||
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call-cxx11
|
||||
|
||||
Results
|
||||
-----------
|
||||
|
||||
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
|
||||
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads
|
||||
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X.
|
||||
Main reasons should be mostly due to:
|
||||
* the overhead to create and manage the threads,
|
||||
* background processes running in parallel,
|
||||
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.
|
||||
|
||||
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color
|
||||
depending on the escaped iteration and using a color palette to get more aesthetic images):
|
||||

|
||||
|
After Width: | Height: | Size: 16 KiB |
|
After Width: | Height: | Size: 62 KiB |
|
After Width: | Height: | Size: 33 KiB |
|
After Width: | Height: | Size: 26 KiB |
@@ -0,0 +1,201 @@
|
||||
Mask operations on matrices {#tutorial_mat_mask_operations}
|
||||
===========================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_how_to_scan_images}
|
||||
@next_tutorial{tutorial_mat_operations}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Bernát Gábor |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Mask operations on matrices are quite simple. The idea is that we recalculate each pixel's value in
|
||||
an image according to a mask matrix (also known as kernel). This mask holds values that will adjust
|
||||
how much influence neighboring pixels (and the current pixel) have on the new pixel value. From a
|
||||
mathematical point of view we make a weighted average, with our specified values.
|
||||
|
||||
Our test case
|
||||
-------------
|
||||
|
||||
Let's consider the issue of an image contrast enhancement method. Basically we want to apply for
|
||||
every pixel of the image the following formula:
|
||||
|
||||
\f[I(i,j) = 5*I(i,j) - [ I(i-1,j) + I(i+1,j) + I(i,j-1) + I(i,j+1)]\f]\f[\iff I(i,j)*M, \text{where }
|
||||
M = \bordermatrix{ _i\backslash ^j & -1 & 0 & +1 \cr
|
||||
-1 & 0 & -1 & 0 \cr
|
||||
0 & -1 & 5 & -1 \cr
|
||||
+1 & 0 & -1 & 0 \cr
|
||||
}\f]
|
||||
|
||||
The first notation is by using a formula, while the second is a compacted version of the first by
|
||||
using a mask. You use the mask by putting the center of the mask matrix (in the upper case noted by
|
||||
the zero-zero index) on the pixel you want to calculate and sum up the pixel values multiplied with
|
||||
the overlapped matrix values. It's the same thing, however in case of large matrices the latter
|
||||
notation is a lot easier to look over.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
@add_toggle_cpp
|
||||
You can download this source code from [here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp) or look in the
|
||||
OpenCV source code libraries sample directory at
|
||||
`samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp`.
|
||||
@include samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
You can download this source code from [here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java) or look in the
|
||||
OpenCV source code libraries sample directory at
|
||||
`samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java`.
|
||||
@include samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
You can download this source code from [here
|
||||
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py) or look in the
|
||||
OpenCV source code libraries sample directory at
|
||||
`samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py`.
|
||||
@include samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py
|
||||
@end_toggle
|
||||
|
||||
The Basic Method
|
||||
----------------
|
||||
|
||||
Now let us see how we can make this happen by using the basic pixel access method or by using the
|
||||
**filter2D()** function.
|
||||
|
||||
Here's a function that will do this:
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method
|
||||
|
||||
At first we make sure that the input images data is in unsigned char format. For this we use the
|
||||
@ref cv::CV_Assert function that throws an error when the expression inside it is false.
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp 8_bit
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method
|
||||
|
||||
At first we make sure that the input images data in unsigned 8 bit format.
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java 8_bit
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method
|
||||
|
||||
At first we make sure that the input images data in unsigned 8 bit format.
|
||||
@code{.py}
|
||||
my_image = cv.cvtColor(my_image, cv.CV_8U)
|
||||
@endcode
|
||||
|
||||
@end_toggle
|
||||
|
||||
We create an output image with the same size and the same type as our input. As you can see in the
|
||||
@ref tutorial_how_to_scan_images_storing "storing" section, depending on the number of channels we may have one or more
|
||||
subcolumns.
|
||||
|
||||
@add_toggle_cpp
|
||||
We will iterate through them via pointers so the total number of elements depends on
|
||||
this number.
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp create_channels
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java create_channels
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@code{.py}
|
||||
height, width, n_channels = my_image.shape
|
||||
result = np.zeros(my_image.shape, my_image.dtype)
|
||||
@endcode
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_cpp
|
||||
We'll use the plain C [] operator to access pixels. Because we need to access multiple rows at the
|
||||
same time we'll acquire the pointers for each of them (a previous, a current and a next line). We
|
||||
need another pointer to where we're going to save the calculation. Then simply access the right
|
||||
items with the [] operator. For moving the output pointer ahead we simply increase this (with one
|
||||
byte) after each operation:
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method_loop
|
||||
|
||||
On the borders of the image the upper notation results inexistent pixel locations (like minus one -
|
||||
minus one). In these points our formula is undefined. A simple solution is to not apply the kernel
|
||||
in these points and, for example, set the pixels on the borders to zeros:
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp borders
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
|
||||
Then we apply the sum and put the new value in the Result matrix.
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method_loop
|
||||
|
||||
On the borders of the image the upper notation results in inexistent pixel locations (like (-1,-1)).
|
||||
In these points our formula is undefined. A simple solution is to not apply the kernel
|
||||
in these points and, for example, set the pixels on the borders to zeros:
|
||||
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java borders
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
|
||||
Then we apply the sum and put the new value in the Result matrix.
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method_loop
|
||||
@end_toggle
|
||||
|
||||
The filter2D function
|
||||
---------------------
|
||||
|
||||
Applying such filters are so common in image processing that in OpenCV there is a function that
|
||||
will take care of applying the mask (also called a kernel in some places). For this you first need
|
||||
to define an object that holds the mask:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp kern
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java kern
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py kern
|
||||
@end_toggle
|
||||
|
||||
Then call the **filter2D()** function specifying the input, the output image and the kernel to
|
||||
use:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp filter2D
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java filter2D
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py filter2D
|
||||
@end_toggle
|
||||
|
||||
The function even has a fifth optional argument to specify the center of the kernel, a sixth
|
||||
for adding an optional value to the filtered pixels before storing them in K and a seventh one
|
||||
for determining what to do in the regions where the operation is undefined (borders).
|
||||
|
||||
This function is shorter, less verbose and, because there are some optimizations, it is usually faster
|
||||
than the *hand-coded method*. For example in my test while the second one took only 13
|
||||
milliseconds the first took around 31 milliseconds. Quite some difference.
|
||||
|
||||
For example:
|
||||
|
||||

|
||||
|
||||
@add_toggle_cpp
|
||||
Check out an instance of running the program on our [YouTube
|
||||
channel](http://www.youtube.com/watch?v=7PF1tAU9se4) .
|
||||
@youtube{7PF1tAU9se4}
|
||||
@end_toggle
|
||||
270
doc/tutorials/core/mat_operations.markdown
Normal file
@@ -0,0 +1,270 @@
|
||||
Operations with images {#tutorial_mat_operations}
|
||||
======================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@prev_tutorial{tutorial_mat_mask_operations}
|
||||
@next_tutorial{tutorial_adding_images}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Input/Output
|
||||
------------
|
||||
|
||||
### Images
|
||||
|
||||
Load an image from a file:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file
|
||||
@end_toggle
|
||||
|
||||
If you read a jpg file, a 3 channel image is created by default. If you need a grayscale image, use:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file in grayscale
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file in grayscale
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file in grayscale
|
||||
@end_toggle
|
||||
|
||||
@note Format of the file is determined by its content (first few bytes). To save an image to a file:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Save image
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Save image
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Save image
|
||||
@end_toggle
|
||||
|
||||
@note Format of the file is determined by its extension.
|
||||
|
||||
@note Use cv::imdecode and cv::imencode to read and write an image from/to memory rather than a file.
|
||||
|
||||
Basic operations with images
|
||||
----------------------------
|
||||
|
||||
### Accessing pixel intensity values
|
||||
|
||||
In order to get pixel intensity value, you have to know the type of an image and the number of
|
||||
channels. Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates
|
||||
x and y:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 1
|
||||
@end_toggle
|
||||
|
||||
C++ version only:
|
||||
intensity.val[0] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV
|
||||
images are represented by the same structure as matrices, we use the same convention for both
|
||||
cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or
|
||||
x-coordinate) follows it. Alternatively, you can use the following notation (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 2
|
||||
|
||||
Now let us consider a 3 channel image with BGR color ordering (the default format returned by
|
||||
imread):
|
||||
|
||||
**C++ code**
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 3
|
||||
|
||||
**Python Python**
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 3
|
||||
|
||||
You can use the same method for floating-point images (for example, you can get such an image by
|
||||
running Sobel on a 3 channel image) (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 4
|
||||
|
||||
The same method can be used to change pixel intensities:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 5
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 5
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 5
|
||||
@end_toggle
|
||||
|
||||
There are functions in OpenCV, especially from calib3d module, such as cv::projectPoints, that take an
|
||||
array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row
|
||||
corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be
|
||||
easily constructed from `std::vector` (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Mat from points vector
|
||||
|
||||
One can access a point in this matrix using the same method `Mat::at` (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Point access
|
||||
|
||||
### Memory management and reference counting
|
||||
|
||||
Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc)
|
||||
and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to
|
||||
the same data. A Mat keeps a reference count that tells if data has to be deallocated when a
|
||||
particular instance of Mat is destroyed. Here is an example of creating two matrices without copying
|
||||
data (**C++ only**):
|
||||
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 1
|
||||
|
||||
As a result, we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. `pointsMat`
|
||||
uses data from points and will not deallocate the memory when destroyed. In this particular
|
||||
instance, however, developer has to make sure that lifetime of `points` is longer than of `pointsMat`
|
||||
If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 2
|
||||
@end_toggle
|
||||
|
||||
An empty output Mat can be supplied to each function.
|
||||
Each implementation calls Mat::create for a destination matrix.
|
||||
This method allocates data for a matrix if it is empty.
|
||||
If it is not empty and has the correct size and type, the method does nothing.
|
||||
If however, size or type are different from the input arguments, the data is deallocated (and lost) and a new data is allocated.
|
||||
For example:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 3
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 3
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 3
|
||||
@end_toggle
|
||||
|
||||
### Primitive operations
|
||||
|
||||
There is a number of convenient operators defined on a matrix. For example, here is how we can make
|
||||
a black image from an existing greyscale image `img`
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Set image to black
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Set image to black
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Set image to black
|
||||
@end_toggle
|
||||
|
||||
Selecting a region of interest:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Select ROI
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Select ROI
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Select ROI
|
||||
@end_toggle
|
||||
|
||||
Conversion from color to greyscale:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp BGR to Gray
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java BGR to Gray
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py BGR to Gray
|
||||
@end_toggle
|
||||
|
||||
Change image type from 8UC1 to 32FC1:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Convert to CV_32F
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Convert to CV_32F
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Convert to CV_32F
|
||||
@end_toggle
|
||||
|
||||
### Visualizing images
|
||||
|
||||
It is very useful to see intermediate results of your algorithm during development process. OpenCV
|
||||
provides a convenient way of visualizing images. A 8U image can be shown using:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 1
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 1
|
||||
@end_toggle
|
||||
|
||||
A call to waitKey() starts a message passing cycle that waits for a key stroke in the "image"
|
||||
window. A 32F image needs to be converted to 8U type. For example:
|
||||
|
||||
@add_toggle_cpp
|
||||
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_java
|
||||
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 2
|
||||
@end_toggle
|
||||
|
||||
@add_toggle_python
|
||||
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 2
|
||||
@end_toggle
|
||||
|
||||
@note Here cv::namedWindow is not necessary since it is immediately followed by cv::imshow.
|
||||
Nevertheless, it can be used to change the window properties or when using cv::createTrackbar
|
||||
|
After Width: | Height: | Size: 2.3 KiB |
|
After Width: | Height: | Size: 4.7 KiB |
|
After Width: | Height: | Size: 5.0 KiB |
|
After Width: | Height: | Size: 1.7 KiB |
|
After Width: | Height: | Size: 2.3 KiB |
|
After Width: | Height: | Size: 4.0 KiB |
|
After Width: | Height: | Size: 7.3 KiB |
|
After Width: | Height: | Size: 10 KiB |
|
After Width: | Height: | Size: 3.5 KiB |
|
After Width: | Height: | Size: 16 KiB |
|
After Width: | Height: | Size: 2.2 KiB |
|
After Width: | Height: | Size: 2.2 KiB |
|
After Width: | Height: | Size: 6.7 KiB |
|
After Width: | Height: | Size: 7.9 KiB |
|
After Width: | Height: | Size: 21 KiB |
@@ -0,0 +1,278 @@
|
||||
Mat - The Basic Image Container {#tutorial_mat_the_basic_image_container}
|
||||
===============================
|
||||
|
||||
@tableofcontents
|
||||
|
||||
@next_tutorial{tutorial_how_to_scan_images}
|
||||
|
||||
| | |
|
||||
| -: | :- |
|
||||
| Original author | Bernát Gábor |
|
||||
| Compatibility | OpenCV >= 3.0 |
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
We have multiple ways to acquire digital images from the real world: digital cameras, scanners,
|
||||
computed tomography, and magnetic resonance imaging to name a few. In every case what we (humans)
|
||||
see are images. However, when transforming this to our digital devices what we record are numerical
|
||||
values for each of the points of the image.
|
||||
|
||||

|
||||
|
||||
For example in the above image you can see that the mirror of the car is nothing more than a matrix
|
||||
containing all the intensity values of the pixel points. How we get and store the pixels values may
|
||||
vary according to our needs, but in the end all images inside a computer world may be reduced to
|
||||
numerical matrices and other information describing the matrix itself. *OpenCV* is a computer vision
|
||||
library whose main focus is to process and manipulate this information. Therefore, the first thing
|
||||
you need to be familiar with is how OpenCV stores and handles images.
|
||||
|
||||
Mat
|
||||
---
|
||||
|
||||
OpenCV has been around since 2001. In those days the library was built around a *C* interface and to
|
||||
store the image in the memory they used a C structure called *IplImage*. This is the one you'll see
|
||||
in most of the older tutorials and educational materials. The problem with this is that it brings to
|
||||
the table all the minuses of the C language. The biggest issue is the manual memory management. It
|
||||
builds on the assumption that the user is responsible for taking care of memory allocation and
|
||||
deallocation. While this is not a problem with smaller programs, once your code base grows it will
|
||||
be more of a struggle to handle all this rather than focusing on solving your development goal.
|
||||
|
||||
Luckily C++ came around and introduced the concept of classes making easier for the user through
|
||||
automatic memory management (more or less). The good news is that C++ is fully compatible with C so
|
||||
no compatibility issues can arise from making the change. Therefore, OpenCV 2.0 introduced a new C++
|
||||
interface which offered a new way of doing things which means you do not need to fiddle with memory
|
||||
management, making your code concise (less to write, to achieve more). The main downside of the C++
|
||||
interface is that many embedded development systems at the moment support only C. Therefore, unless
|
||||
you are targeting embedded platforms, there's no point to using the *old* methods (unless you're a
|
||||
masochist programmer and you're asking for trouble).
|
||||
|
||||
The first thing you need to know about *Mat* is that you no longer need to manually allocate its
|
||||
memory and release it as soon as you do not need it. While doing this is still a possibility, most
|
||||
of the OpenCV functions will allocate its output data automatically. As a nice bonus if you pass on
|
||||
an already existing *Mat* object, which has already allocated the required space for the matrix,
|
||||
this will be reused. In other words we use at all times only as much memory as we need to perform
|
||||
the task.
|
||||
|
||||
*Mat* is basically a class with two data parts: the matrix header (containing information such as
|
||||
the size of the matrix, the method used for storing, at which address is the matrix stored, and so
|
||||
on) and a pointer to the matrix containing the pixel values (taking any dimensionality depending on
|
||||
the method chosen for storing) . The matrix header size is constant, however the size of the matrix
|
||||
itself may vary from image to image and usually is larger by orders of magnitude.
|
||||
|
||||
OpenCV is an image processing library. It contains a large collection of image processing functions.
|
||||
To solve a computational challenge, most of the time you will end up using multiple functions of the
|
||||
library. Because of this, passing images to functions is a common practice. We should not forget
|
||||
that we are talking about image processing algorithms, which tend to be quite computational heavy.
|
||||
The last thing we want to do is further decrease the speed of your program by making unnecessary
|
||||
copies of potentially *large* images.
|
||||
|
||||
To tackle this issue OpenCV uses a reference counting system. The idea is that each *Mat* object has
|
||||
its own header, however a matrix may be shared between two *Mat* objects by having their matrix
|
||||
pointers point to the same address. Moreover, the copy operators **will only copy the headers** and
|
||||
the pointer to the large matrix, not the data itself.
|
||||
|
||||
@code{.cpp}
|
||||
Mat A, C; // creates just the header parts
|
||||
A = imread(argv[1], IMREAD_COLOR); // here we'll know the method used (allocate matrix)
|
||||
|
||||
Mat B(A); // Use the copy constructor
|
||||
|
||||
C = A; // Assignment operator
|
||||
@endcode
|
||||
|
||||
All the above objects, in the end, point to the same single data matrix and making a modification
|
||||
using any of them will affect all the other ones as well. In practice the different objects just
|
||||
provide different access methods to the same underlying data. Nevertheless, their header parts are
|
||||
different. The real interesting part is that you can create headers which refer to only a subsection
|
||||
of the full data. For example, to create a region of interest (*ROI*) in an image you just create
|
||||
a new header with the new boundaries:
|
||||
@code{.cpp}
|
||||
Mat D (A, Rect(10, 10, 100, 100) ); // using a rectangle
|
||||
Mat E = A(Range::all(), Range(1,3)); // using row and column boundaries
|
||||
@endcode
|
||||
Now you may ask -- if the matrix itself may belong to multiple *Mat* objects who takes responsibility
|
||||
for cleaning it up when it's no longer needed. The short answer is: the last object that used it.
|
||||
This is handled by using a reference counting mechanism. Whenever somebody copies a header of a
|
||||
*Mat* object, a counter is increased for the matrix. Whenever a header is cleaned, this counter
|
||||
is decreased. When the counter reaches zero the matrix is freed. Sometimes you will want to copy
|
||||
the matrix itself too, so OpenCV provides @ref cv::Mat::clone() and @ref cv::Mat::copyTo() functions.
|
||||
@code{.cpp}
|
||||
Mat F = A.clone();
|
||||
Mat G;
|
||||
A.copyTo(G);
|
||||
@endcode
|
||||
Now modifying *F* or *G* will not affect the matrix pointed by the *A*'s header. What you need to
|
||||
remember from all this is that:
|
||||
|
||||
- Output image allocation for OpenCV functions is automatic (unless specified otherwise).
|
||||
- You do not need to think about memory management with OpenCV's C++ interface.
|
||||
- The assignment operator and the copy constructor only copies the header.
|
||||
- The underlying matrix of an image may be copied using the @ref cv::Mat::clone() and @ref cv::Mat::copyTo()
|
||||
functions.
|
||||
|
||||
Storing methods
|
||||
-----------------
|
||||
|
||||
This is about how you store the pixel values. You can select the color space and the data type used.
|
||||
The color space refers to how we combine color components in order to code a given color. The
|
||||
simplest one is the grayscale where the colors at our disposal are black and white. The combination
|
||||
of these allows us to create many shades of gray.
|
||||
|
||||
For *colorful* ways we have a lot more methods to choose from. Each of them breaks it down to three
|
||||
or four basic components and we can use the combination of these to create the others. The most
|
||||
popular one is RGB, mainly because this is also how our eye builds up colors. Its base colors are
|
||||
red, green and blue. To code the transparency of a color sometimes a fourth element: alpha (A) is
|
||||
added.
|
||||
|
||||
There are, however, many other color systems each with their own advantages:
|
||||
|
||||
- RGB is the most common as our eyes use something similar, however keep in mind that OpenCV standard display
|
||||
system composes colors using the BGR color space (red and blue channels are swapped places).
|
||||
- The HSV and HLS decompose colors into their hue, saturation and value/luminance components,
|
||||
which is a more natural way for us to describe colors. You might, for example, dismiss the last
|
||||
component, making your algorithm less sensible to the light conditions of the input image.
|
||||
- YCrCb is used by the popular JPEG image format.
|
||||
- CIE L\*a\*b\* is a perceptually uniform color space, which comes in handy if you need to measure
|
||||
the *distance* of a given color to another color.
|
||||
|
||||
Each of the building components has its own valid domains. This leads to the data type used. How
|
||||
we store a component defines the control we have over its domain. The smallest data type possible is
|
||||
*char*, which means one byte or 8 bits. This may be unsigned (so can store values from 0 to 255) or
|
||||
signed (values from -127 to +127). Although in case of three components this already gives 16
|
||||
million possible colors to represent (like in case of RGB) we may acquire an even finer control by
|
||||
using the float (4 byte = 32 bit) or double (8 byte = 64 bit) data types for each component.
|
||||
Nevertheless, remember that increasing the size of a component also increases the size of the whole
|
||||
picture in the memory.
|
||||
|
||||
Creating a Mat object explicitly
|
||||
----------------------------------
|
||||
|
||||
In the @ref tutorial_load_save_image tutorial you have already learned how to write a matrix to an image
|
||||
file by using the @ref cv::imwrite() function. However, for debugging purposes it's much more
|
||||
convenient to see the actual values. You can do this using the \<\< operator of *Mat*. Be aware that
|
||||
this only works for two dimensional matrices.
|
||||
|
||||
Although *Mat* works really well as an image container, it is also a general matrix class.
|
||||
Therefore, it is possible to create and manipulate multidimensional matrices. You can create a Mat
|
||||
object in multiple ways:
|
||||
|
||||
- @ref cv::Mat::Mat Constructor
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp constructor
|
||||
|
||||

|
||||
|
||||
For two dimensional and multichannel images we first define their size: row and column count wise.
|
||||
|
||||
Then we need to specify the data type to use for storing the elements and the number of channels
|
||||
per matrix point. To do this we have multiple definitions constructed according to the following
|
||||
convention:
|
||||
@code
|
||||
CV_[The number of bits per item][Signed or Unsigned][Type Prefix]C[The channel number]
|
||||
@endcode
|
||||
For instance, *CV_8UC3* means we use unsigned char types that are 8 bit long and each pixel has
|
||||
three of these to form the three channels. There are types predefined for up to four channels. The
|
||||
@ref cv::Scalar is four element short vector. Specify it and you can initialize all matrix
|
||||
points with a custom value. If you need more you can create the type with the upper macro, setting
|
||||
the channel number in parenthesis as you can see below.
|
||||
|
||||
- Use C/C++ arrays and initialize via constructor
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp init
|
||||
|
||||
The upper example shows how to create a matrix with more than two dimensions. Specify its
|
||||
dimension, then pass a pointer containing the size for each dimension and the rest remains the
|
||||
same.
|
||||
|
||||
- @ref cv::Mat::create function:
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp create
|
||||
|
||||

|
||||
|
||||
You cannot initialize the matrix values with this construction. It will only reallocate its matrix
|
||||
data memory if the new size will not fit into the old one.
|
||||
|
||||
- MATLAB style initializer: @ref cv::Mat::zeros , @ref cv::Mat::ones , @ref cv::Mat::eye . Specify size and
|
||||
data type to use:
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp matlab
|
||||
|
||||

|
||||
|
||||
- For small matrices you may use comma separated initializers or initializer lists (C++11 support is required in the last case):
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp comma
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp list
|
||||
|
||||

|
||||
|
||||
- Create a new header for an existing *Mat* object and @ref cv::Mat::clone or @ref cv::Mat::copyTo it.
|
||||
|
||||
@snippet mat_the_basic_image_container.cpp clone
|
||||
|
||||

|
||||
|
||||
@note
|
||||
You can fill out a matrix with random values using the @ref cv::randu() function. You need to
|
||||
give a lower and upper limit for the random values:
|
||||
@snippet mat_the_basic_image_container.cpp random
|
||||
|
||||
|
||||
Output formatting
|
||||
-----------------
|
||||
|
||||
In the above examples you could see the default formatting option. OpenCV, however, allows you to
|
||||
format your matrix output:
|
||||
|
||||
- Default
|
||||
@snippet mat_the_basic_image_container.cpp out-default
|
||||

|
||||
|
||||
- Python
|
||||
@snippet mat_the_basic_image_container.cpp out-python
|
||||

|
||||
|
||||
- Comma separated values (CSV)
|
||||
@snippet mat_the_basic_image_container.cpp out-csv
|
||||

|
||||
|
||||
- Numpy
|
||||
@snippet mat_the_basic_image_container.cpp out-numpy
|
||||

|
||||
|
||||
- C
|
||||
@snippet mat_the_basic_image_container.cpp out-c
|
||||

|
||||
|
||||
Output of other common items
|
||||
----------------------------
|
||||
|
||||
OpenCV offers support for output of other common OpenCV data structures too via the \<\< operator:
|
||||
|
||||
- 2D Point
|
||||
@snippet mat_the_basic_image_container.cpp out-point2
|
||||

|
||||
|
||||
- 3D Point
|
||||
@snippet mat_the_basic_image_container.cpp out-point3
|
||||

|
||||
|
||||
- std::vector via cv::Mat
|
||||
@snippet mat_the_basic_image_container.cpp out-vector
|
||||

|
||||
|
||||
- std::vector of points
|
||||
@snippet mat_the_basic_image_container.cpp out-vector-points
|
||||

|
||||
|
||||
Most of the samples here have been included in a small console application. You can download it from
|
||||
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/mat_the_basic_image_container/mat_the_basic_image_container.cpp)
|
||||
or in the core section of the cpp samples.
|
||||
|
||||
You can also find a quick video demonstration of this on
|
||||
[YouTube](https://www.youtube.com/watch?v=1tibU7vGWpk).
|
||||
|
||||
@youtube{1tibU7vGWpk}
|
||||
12
doc/tutorials/core/table_of_content_core.markdown
Normal file
@@ -0,0 +1,12 @@
|
||||
The Core Functionality (core module) {#tutorial_table_of_content_core}
|
||||
=====================================
|
||||
|
||||
- @subpage tutorial_mat_the_basic_image_container
|
||||
- @subpage tutorial_how_to_scan_images
|
||||
- @subpage tutorial_mat_mask_operations
|
||||
- @subpage tutorial_mat_operations
|
||||
- @subpage tutorial_adding_images
|
||||
- @subpage tutorial_basic_linear_transform
|
||||
- @subpage tutorial_discrete_fourier_transform
|
||||
- @subpage tutorial_file_input_output_with_xml_yml
|
||||
- @subpage tutorial_how_to_use_OpenCV_parallel_for_
|
||||