init - 初始化项目

This commit is contained in:
Lee Nony
2022-05-06 01:58:53 +08:00
commit 90a5cc7cb6
6772 changed files with 2837787 additions and 0 deletions

View File

@@ -0,0 +1,123 @@
Adding (blending) two images using OpenCV {#tutorial_adding_images}
=========================================
@tableofcontents
@prev_tutorial{tutorial_mat_operations}
@next_tutorial{tutorial_basic_linear_transform}
| | |
| -: | :- |
| Original author | Ana Huamán |
| Compatibility | OpenCV >= 3.0 |
We will learn how to blend two images!
Goal
----
In this tutorial you will learn:
- what is *linear blending* and why it is useful;
- how to add two images using **addWeighted()**
Theory
------
@note
The explanation below belongs to the book [Computer Vision: Algorithms and
Applications](http://szeliski.org/Book/) by Richard Szeliski
From our previous tutorial, we know already a bit of *Pixel operators*. An interesting dyadic
(two-input) operator is the *linear blend operator*:
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$ this operator can be used to perform a temporal
*cross-dissolve* between two images or videos, as seen in slide shows and film productions (cool,
eh?)
Source Code
-----------
@add_toggle_cpp
Download the source code from
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/AddingImages/AddingImages.cpp).
@include cpp/tutorial_code/core/AddingImages/AddingImages.cpp
@end_toggle
@add_toggle_java
Download the source code from
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/AddingImages/AddingImages.java).
@include java/tutorial_code/core/AddingImages/AddingImages.java
@end_toggle
@add_toggle_python
Download the source code from
[here](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/AddingImages/adding_images.py).
@include python/tutorial_code/core/AddingImages/adding_images.py
@end_toggle
Explanation
-----------
Since we are going to perform:
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
We need two source images (\f$f_{0}(x)\f$ and \f$f_{1}(x)\f$). So, we load them in the usual way:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp load
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/AddingImages/AddingImages.java load
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/AddingImages/adding_images.py load
@end_toggle
We used the following images: [LinuxLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/LinuxLogo.jpg) and [WindowsLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/WindowsLogo.jpg)
@warning Since we are *adding* *src1* and *src2*, they both have to be of the same size
(width and height) and type.
Now we need to generate the `g(x)` image. For this, the function **addWeighted()** comes quite handy:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp blend_images
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/AddingImages/AddingImages.java blend_images
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/AddingImages/adding_images.py blend_images
Numpy version of above line (but cv function is around 2x faster):
\code{.py}
dst = np.uint8(alpha*(img1)+beta*(img2))
\endcode
@end_toggle
since **addWeighted()** produces:
\f[dst = \alpha \cdot src1 + \beta \cdot src2 + \gamma\f]
In this case, `gamma` is the argument \f$0.0\f$ in the code above.
Create windows, show the images and wait for the user to end the program.
@add_toggle_cpp
@snippet cpp/tutorial_code/core/AddingImages/AddingImages.cpp display
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/AddingImages/AddingImages.java display
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/AddingImages/adding_images.py display
@end_toggle
Result
------
![](images/Adding_Images_Tutorial_Result_Big.jpg)

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.4 KiB

View File

@@ -0,0 +1,325 @@
Changing the contrast and brightness of an image! {#tutorial_basic_linear_transform}
=================================================
@tableofcontents
@prev_tutorial{tutorial_adding_images}
@next_tutorial{tutorial_discrete_fourier_transform}
| | |
| -: | :- |
| Original author | Ana Huamán |
| Compatibility | OpenCV >= 3.0 |
Goal
----
In this tutorial you will learn how to:
- Access pixel values
- Initialize a matrix with zeros
- Learn what @ref cv::saturate_cast does and why it is useful
- Get some cool info about pixel transformations
- Improve the brightness of an image on a practical example
Theory
------
@note
The explanation below belongs to the book [Computer Vision: Algorithms and
Applications](http://szeliski.org/Book/) by Richard Szeliski
### Image Processing
- A general image processing operator is a function that takes one or more input images and
produces an output image.
- Image transforms can be seen as:
- Point operators (pixel transforms)
- Neighborhood (area-based) operators
### Pixel Transforms
- In this kind of image processing transform, each output pixel's value depends on only the
corresponding input pixel value (plus, potentially, some globally collected information or
parameters).
- Examples of such operators include *brightness and contrast adjustments* as well as color
correction and transformations.
### Brightness and contrast adjustments
- Two commonly used point processes are *multiplication* and *addition* with a constant:
\f[g(x) = \alpha f(x) + \beta\f]
- The parameters \f$\alpha > 0\f$ and \f$\beta\f$ are often called the *gain* and *bias* parameters;
sometimes these parameters are said to control *contrast* and *brightness* respectively.
- You can think of \f$f(x)\f$ as the source image pixels and \f$g(x)\f$ as the output image pixels. Then,
more conveniently we can write the expression as:
\f[g(i,j) = \alpha \cdot f(i,j) + \beta\f]
where \f$i\f$ and \f$j\f$ indicates that the pixel is located in the *i-th* row and *j-th* column.
Code
----
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp)
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
@include samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java)
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
@include samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py)
- The following code performs the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ :
@include samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py
@end_toggle
Explanation
-----------
- We load an image using @ref cv::imread and save it in a Mat object:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-load
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-load
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-load
@end_toggle
- Now, since we will make some transformations to this image, we need a new Mat object to store
it. Also, we want this to have the following features:
- Initial pixel values equal to zero
- Same size and type as the original image
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-output
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-output
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-output
@end_toggle
We observe that @ref cv::Mat::zeros returns a Matlab-style zero initializer based on
*image.size()* and *image.type()*
- We ask now the values of \f$\alpha\f$ and \f$\beta\f$ to be entered by the user:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-parameters
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-parameters
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-parameters
@end_toggle
- Now, to perform the operation \f$g(i,j) = \alpha \cdot f(i,j) + \beta\f$ we will access to each
pixel in image. Since we are operating with BGR images, we will have three values per pixel (B,
G and R), so we will also access them separately. Here is the piece of code:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-operation
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-operation
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-operation
@end_toggle
Notice the following (**C++ code only**):
- To access each pixel in the images we are using this syntax: *image.at\<Vec3b\>(y,x)[c]*
where *y* is the row, *x* is the column and *c* is B, G or R (0, 1 or 2).
- Since the operation \f$\alpha \cdot p(i,j) + \beta\f$ can give values out of range or not
integers (if \f$\alpha\f$ is float), we use cv::saturate_cast to make sure the
values are valid.
- Finally, we create windows and show the images, the usual way.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-display
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java basic-linear-transform-display
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py basic-linear-transform-display
@end_toggle
@note
Instead of using the **for** loops to access each pixel, we could have simply used this command:
@add_toggle_cpp
@code{.cpp}
image.convertTo(new_image, -1, alpha, beta);
@endcode
@end_toggle
@add_toggle_java
@code{.java}
image.convertTo(newImage, -1, alpha, beta);
@endcode
@end_toggle
@add_toggle_python
@code{.py}
new_image = cv.convertScaleAbs(image, alpha=alpha, beta=beta)
@endcode
@end_toggle
where @ref cv::Mat::convertTo would effectively perform *new_image = a*image + beta\*. However, we
wanted to show you how to access each pixel. In any case, both methods give the same result but
convertTo is more optimized and works a lot faster.
Result
------
- Running our code and using \f$\alpha = 2.2\f$ and \f$\beta = 50\f$
@code{.bash}
$ ./BasicLinearTransforms lena.jpg
Basic Linear Transforms
-------------------------
* Enter the alpha value [1.0-3.0]: 2.2
* Enter the beta value [0-100]: 50
@endcode
- We get this:
![](images/Basic_Linear_Transform_Tutorial_Result_big.jpg)
Practical example
----
In this paragraph, we will put into practice what we have learned to correct an underexposed image by adjusting the brightness
and the contrast of the image. We will also see another technique to correct the brightness of an image called
gamma correction.
### Brightness and contrast adjustments
Increasing (/ decreasing) the \f$\beta\f$ value will add (/ subtract) a constant value to every pixel. Pixel values outside of the [0 ; 255]
range will be saturated (i.e. a pixel value higher (/ lesser) than 255 (/ 0) will be clamped to 255 (/ 0)).
![In light gray, histogram of the original image, in dark gray when brightness = 80 in Gimp](images/Basic_Linear_Transform_Tutorial_hist_beta.png)
The histogram represents for each color level the number of pixels with that color level. A dark image will have many pixels with
low color value and thus the histogram will present a peak in its left part. When adding a constant bias, the histogram is shifted to the
right as we have added a constant bias to all the pixels.
The \f$\alpha\f$ parameter will modify how the levels spread. If \f$ \alpha < 1 \f$, the color levels will be compressed and the result
will be an image with less contrast.
![In light gray, histogram of the original image, in dark gray when contrast < 0 in Gimp](images/Basic_Linear_Transform_Tutorial_hist_alpha.png)
Note that these histograms have been obtained using the Brightness-Contrast tool in the Gimp software. The brightness tool should be
identical to the \f$\beta\f$ bias parameters but the contrast tool seems to differ to the \f$\alpha\f$ gain where the output range
seems to be centered with Gimp (as you can notice in the previous histogram).
It can occur that playing with the \f$\beta\f$ bias will improve the brightness but in the same time the image will appear with a
slight veil as the contrast will be reduced. The \f$\alpha\f$ gain can be used to diminue this effect but due to the saturation,
we will lose some details in the original bright regions.
### Gamma correction
[Gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) can be used to correct the brightness of an image by using a non
linear transformation between the input values and the mapped output values:
\f[O = \left( \frac{I}{255} \right)^{\gamma} \times 255\f]
As this relation is non linear, the effect will not be the same for all the pixels and will depend to their original value.
![Plot for different values of gamma](images/Basic_Linear_Transform_Tutorial_gamma.png)
When \f$ \gamma < 1 \f$, the original dark regions will be brighter and the histogram will be shifted to the right whereas it will
be the opposite with \f$ \gamma > 1 \f$.
### Correct an underexposed image
The following image has been corrected with: \f$ \alpha = 1.3 \f$ and \f$ \beta = 40 \f$.
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_linear_transform_correction.jpg)
The overall brightness has been improved but you can notice that the clouds are now greatly saturated due to the numerical saturation
of the implementation used ([highlight clipping](https://en.wikipedia.org/wiki/Clipping_(photography)) in photography).
The following image has been corrected with: \f$ \gamma = 0.4 \f$.
![By Visem (Own work) [CC BY-SA 3.0], via Wikimedia Commons](images/Basic_Linear_Transform_Tutorial_gamma_correction.jpg)
The gamma correction should tend to add less saturation effect as the mapping is non linear and there is no numerical saturation possible as in the previous method.
![Left: histogram after alpha, beta correction ; Center: histogram of the original image ; Right: histogram after the gamma correction](images/Basic_Linear_Transform_Tutorial_histogram_compare.png)
The previous figure compares the histograms for the three images (the y-ranges are not the same between the three histograms).
You can notice that most of the pixel values are in the lower part of the histogram for the original image. After \f$ \alpha \f$,
\f$ \beta \f$ correction, we can observe a big peak at 255 due to the saturation as well as a shift in the right.
After gamma correction, the histogram is shifted to the right but the pixels in the dark regions are more shifted
(see the gamma curves [figure](Basic_Linear_Transform_Tutorial_gamma.png)) than those in the bright regions.
In this tutorial, you have seen two simple methods to adjust the contrast and the brightness of an image. **They are basic techniques
and are not intended to be used as a replacement of a raster graphics editor!**
### Code
@add_toggle_cpp
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp).
@end_toggle
@add_toggle_java
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java).
@end_toggle
@add_toggle_python
Code for the tutorial is [here](https://github.com/opencv/opencv/blob/master/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py).
@end_toggle
Code for the gamma correction:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp changing-contrast-brightness-gamma-correction
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java changing-contrast-brightness-gamma-correction
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py changing-contrast-brightness-gamma-correction
@end_toggle
A look-up table is used to improve the performance of the computation as only 256 values needs to be calculated once.
### Additional resources
- [Gamma correction in graphics rendering](https://learnopengl.com/#!Advanced-Lighting/Gamma-Correction)
- [Gamma correction and images displayed on CRT monitors](http://www.graphics.cornell.edu/~westin/gamma/gamma.html)
- [Digital exposure techniques](http://www.cambridgeincolour.com/tutorials/digital-exposure-techniques.htm)

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 222 KiB

View File

@@ -0,0 +1,245 @@
Discrete Fourier Transform {#tutorial_discrete_fourier_transform}
==========================
@tableofcontents
@prev_tutorial{tutorial_basic_linear_transform}
@next_tutorial{tutorial_file_input_output_with_xml_yml}
| | |
| -: | :- |
| Original author | Bernát Gábor |
| Compatibility | OpenCV >= 3.0 |
Goal
----
We'll seek answers for the following questions:
- What is a Fourier transform and why use it?
- How to do it in OpenCV?
- Usage of functions such as: **copyMakeBorder()** , **merge()** , **dft()** ,
**getOptimalDFTSize()** , **log()** and **normalize()** .
Source code
-----------
@add_toggle_cpp
You can [download this from here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp) or
find it in the
`samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp` of the
OpenCV source code library.
@end_toggle
@add_toggle_java
You can [download this from here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java) or
find it in the
`samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java` of the
OpenCV source code library.
@end_toggle
@add_toggle_python
You can [download this from here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py) or
find it in the
`samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py` of the
OpenCV source code library.
@end_toggle
Here's a sample usage of **dft()** :
@add_toggle_cpp
@include cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp
@end_toggle
@add_toggle_java
@include java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java
@end_toggle
@add_toggle_python
@include python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py
@end_toggle
Explanation
-----------
The Fourier Transform will decompose an image into its sinus and cosines components. In other words,
it will transform an image from its spatial domain to its frequency domain. The idea is that any
function may be approximated exactly with the sum of infinite sinus and cosines functions. The
Fourier Transform is a way how to do this. Mathematically a two dimensional images Fourier transform
is:
\f[F(k,l) = \displaystyle\sum\limits_{i=0}^{N-1}\sum\limits_{j=0}^{N-1} f(i,j)e^{-i2\pi(\frac{ki}{N}+\frac{lj}{N})}\f]\f[e^{ix} = \cos{x} + i\sin {x}\f]
Here f is the image value in its spatial domain and F in its frequency domain. The result of the
transformation is complex numbers. Displaying this is possible either via a *real* image and a
*complex* image or via a *magnitude* and a *phase* image. However, throughout the image processing
algorithms only the *magnitude* image is interesting as this contains all the information we need
about the images geometric structure. Nevertheless, if you intend to make some modifications of the
image in these forms and then you need to retransform it you'll need to preserve both of these.
In this sample I'll show how to calculate and show the *magnitude* image of a Fourier Transform. In
case of digital images are discrete. This means they may take up a value from a given domain value.
For example in a basic gray scale image values usually are between zero and 255. Therefore the
Fourier Transform too needs to be of a discrete type resulting in a Discrete Fourier Transform
(*DFT*). You'll want to use this whenever you need to determine the structure of an image from a
geometrical point of view. Here are the steps to follow (in case of a gray scale input image *I*):
#### Expand the image to an optimal size
The performance of a DFT is dependent of the image
size. It tends to be the fastest for image sizes that are multiple of the numbers two, three and
five. Therefore, to achieve maximal performance it is generally a good idea to pad border values
to the image to get a size with such traits. The **getOptimalDFTSize()** returns this
optimal size and we can use the **copyMakeBorder()** function to expand the borders of an
image (the appended pixels are initialized with zero):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp expand
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java expand
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py expand
@end_toggle
#### Make place for both the complex and the real values
The result of a Fourier Transform is
complex. This implies that for each image value the result is two image values (one per
component). Moreover, the frequency domains range is much larger than its spatial counterpart.
Therefore, we store these usually at least in a *float* format. Therefore we'll convert our
input image to this type and expand it with another channel to hold the complex values:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp complex_and_real
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java complex_and_real
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py complex_and_real
@end_toggle
#### Make the Discrete Fourier Transform
It's possible an in-place calculation (same input as
output):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp dft
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java dft
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py dft
@end_toggle
#### Transform the real and complex values to magnitude
A complex number has a real (*Re*) and a
complex (imaginary - *Im*) part. The results of a DFT are complex numbers. The magnitude of a
DFT is:
\f[M = \sqrt[2]{ {Re(DFT(I))}^2 + {Im(DFT(I))}^2}\f]
Translated to OpenCV code:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp magnitude
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java magnitude
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py magnitude
@end_toggle
#### Switch to a logarithmic scale
It turns out that the dynamic range of the Fourier
coefficients is too large to be displayed on the screen. We have some small and some high
changing values that we can't observe like this. Therefore the high values will all turn out as
white points, while the small ones as black. To use the gray scale values to for visualization
we can transform our linear scale to a logarithmic one:
\f[M_1 = \log{(1 + M)}\f]
Translated to OpenCV code:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp log
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java log
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py log
@end_toggle
#### Crop and rearrange
Remember, that at the first step, we expanded the image? Well, it's time
to throw away the newly introduced values. For visualization purposes we may also rearrange the
quadrants of the result, so that the origin (zero, zero) corresponds with the image center.
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp crop_rearrange
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java crop_rearrange
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py crop_rearrange
@end_toggle
#### Normalize
This is done again for visualization purposes. We now have the magnitudes,
however this are still out of our image display range of zero to one. We normalize our values to
this range using the @ref cv::normalize() function.
@add_toggle_cpp
@snippet cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp normalize
@end_toggle
@add_toggle_java
@snippet java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java normalize
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py normalize
@end_toggle
Result
------
An application idea would be to determine the geometrical orientation present in the image. For
example, let us find out if a text is horizontal or not? Looking at some text you'll notice that the
text lines sort of form also horizontal lines and the letters form sort of vertical lines. These two
main components of a text snippet may be also seen in case of the Fourier transform. Let us use
[this horizontal ](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextN.png) and [this rotated](https://raw.githubusercontent.com/opencv/opencv/master/samples/data/imageTextR.png)
image about a text.
In case of the horizontal text:
![](images/result_normal.jpg)
In case of a rotated text:
![](images/result_rotated.jpg)
You can see that the most influential components of the frequency domain (brightest dots on the
magnitude image) follow the geometric rotation of objects on the image. From this we may calculate
the offset and perform an image rotation to correct eventual miss alignments.

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -0,0 +1,296 @@
File Input and Output using XML and YAML files {#tutorial_file_input_output_with_xml_yml}
==============================================
@tableofcontents
@prev_tutorial{tutorial_discrete_fourier_transform}
@next_tutorial{tutorial_how_to_use_OpenCV_parallel_for_}
| | |
| -: | :- |
| Original author | Bernát Gábor |
| Compatibility | OpenCV >= 3.0 |
Goal
----
You'll find answers for the following questions:
- How to print and read text entries to a file and OpenCV using YAML or XML files?
- How to do the same for OpenCV data structures?
- How to do this for your data structures?
- Usage of OpenCV data structures such as @ref cv::FileStorage , @ref cv::FileNode or @ref
cv::FileNodeIterator .
Source code
-----------
@add_toggle_cpp
You can [download this from here
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp) or find it in the
`samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp` of the OpenCV source code
library.
Here's a sample code of how to achieve all the stuff enumerated at the goal list.
@include cpp/tutorial_code/core/file_input_output/file_input_output.cpp
@end_toggle
@add_toggle_python
You can [download this from here
](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/core/file_input_output/file_input_output.py) or find it in the
`samples/python/tutorial_code/core/file_input_output/file_input_output.py` of the OpenCV source code
library.
Here's a sample code of how to achieve all the stuff enumerated at the goal list.
@include python/tutorial_code/core/file_input_output/file_input_output.py
@end_toggle
Explanation
-----------
Here we talk only about XML and YAML file inputs. Your output (and its respective input) file may
have only one of these extensions and the structure coming from this. They are two kinds of data
structures you may serialize: *mappings* (like the STL map and the Python dictionary) and *element sequence* (like the STL
vector). The difference between these is that in a map every element has a unique name through what
you may access it. For sequences you need to go through them to query a specific item.
-# **XML/YAML File Open and Close.** Before you write any content to such file you need to open it
and at the end to close it. The XML/YAML data structure in OpenCV is @ref cv::FileStorage . To
specify that this structure to which file binds on your hard drive you can use either its
constructor or the *open()* function of this:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp open
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py open
@end_toggle
Either one of this you use the second argument is a constant specifying the type of operations
you'll be able to on them: WRITE, READ or APPEND. The extension specified in the file name also
determinates the output format that will be used. The output may be even compressed if you
specify an extension such as *.xml.gz*.
The file automatically closes when the @ref cv::FileStorage objects is destroyed. However, you
may explicitly call for this by using the *release* function:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp close
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py close
@end_toggle
-# **Input and Output of text and numbers.** In C++, the data structure uses the \<\< output
operator in the STL library. In Python, @ref cv::FileStorage.write() is used instead. For
outputting any type of data structure we need first to specify its name. We do this by just
simply pushing the name of this to the stream in C++. In Python, the first parameter for the
write function is the name. For basic types you may follow this with the print of the value :
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeNum
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeNum
@end_toggle
Reading in is a simple addressing (via the [] operator) and casting operation or a read via
the \>\> operator. In Python, we address with getNode() and use real() :
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readNum
@end_toggle
@add_toggle_python
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readNum
@end_toggle
-# **Input/Output of OpenCV Data structures.** Well these behave exactly just as the basic C++
and Python types:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp iomati
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp iomatw
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp iomat
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py iomati
@snippet python/tutorial_code/core/file_input_output/file_input_output.py iomatw
@snippet python/tutorial_code/core/file_input_output/file_input_output.py iomat
@end_toggle
-# **Input/Output of vectors (arrays) and associative maps.** As I mentioned beforehand, we can
output maps and sequences (array, vector) too. Again we first print the name of the variable and
then we have to specify if our output is either a sequence or map.
For sequence before the first element print the "[" character and after the last one the "]"
character. With Python, call `FileStorage.startWriteStruct(structure_name, struct_type)`,
where `struct_type` is `cv2.FileNode_MAP` or `cv2.FileNode_SEQ` to start writing the structure.
Call `FileStorage.endWriteStruct()` to finish the structure:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeStr
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeStr
@end_toggle
For maps the drill is the same however now we use the "{" and "}" delimiter characters:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeMap
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeMap
@end_toggle
To read from these we use the @ref cv::FileNode and the @ref cv::FileNodeIterator data
structures. The [] operator of the @ref cv::FileStorage class (or the getNode() function in Python) returns a @ref cv::FileNode data
type. If the node is sequential we can use the @ref cv::FileNodeIterator to iterate through the
items. In Python, the at() function can be used to address elements of the sequence and the
size() function returns the length of the sequence:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readStr
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py readStr
@end_toggle
For maps you can use the [] operator (at() function in Python) again to access the given item (or the \>\> operator too):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readMap
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py readMap
@end_toggle
-# **Read and write your own data structures.** Suppose you have a data structure such as:
@add_toggle_cpp
@code{.cpp}
class MyData
{
public:
MyData() : A(0), X(0), id() {}
public: // Data Members
int A;
double X;
string id;
};
@endcode
@end_toggle
@add_toggle_python
@code{.py}
class MyData:
def __init__(self):
self.A = self.X = 0
self.name = ''
@endcode
@end_toggle
In C++, it's possible to serialize this through the OpenCV I/O XML/YAML interface (just as
in case of the OpenCV data structures) by adding a read and a write function inside and outside of your
class. In Python, you can get close to this by implementing a read and write function inside
the class. For the inside part:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp inside
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py inside
@end_toggle
@add_toggle_cpp
In C++, you need to add the following functions definitions outside the class:
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp outside
@end_toggle
Here you can observe that in the read section we defined what happens if the user tries to read
a non-existing node. In this case we just return the default initialization value, however a
more verbose solution would be to return for instance a minus one value for an object ID.
Once you added these four functions use the \>\> operator for write and the \<\< operator for
read (or the defined input/output functions for Python):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIOi
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIOw
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIO
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIOi
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIOw
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIO
@end_toggle
Or to try out reading a non-existing read:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp nonexist
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py nonexist
@end_toggle
Result
------
Well mostly we just print out the defined numbers. On the screen of your console you could see:
@code{.bash}
Write Done.
Reading:
100image1.jpg
Awesomeness
baboon.jpg
Two 2; One 1
R = [1, 0, 0;
0, 1, 0;
0, 0, 1]
T = [0; 0; 0]
MyData =
{ id = mydata1234, X = 3.14159, A = 97}
Attempt to read NonExisting (should initialize the data structure with its default).
NonExisting =
{ id = , X = 0, A = 0}
Tip: Open up output.xml with a text editor to see the serialized data.
@endcode
Nevertheless, it's much more interesting what you may see in the output xml file:
@code{.xml}
<?xml version="1.0"?>
<opencv_storage>
<iterationNr>100</iterationNr>
<strings>
image1.jpg Awesomeness baboon.jpg</strings>
<Mapping>
<One>1</One>
<Two>2</Two></Mapping>
<R type_id="opencv-matrix">
<rows>3</rows>
<cols>3</cols>
<dt>u</dt>
<data>
1 0 0 0 1 0 0 0 1</data></R>
<T type_id="opencv-matrix">
<rows>3</rows>
<cols>1</cols>
<dt>d</dt>
<data>
0. 0. 0.</data></T>
<MyData>
<A>97</A>
<X>3.1415926535897931e+000</X>
<id>mydata1234</id></MyData>
</opencv_storage>
@endcode
Or the YAML file:
@code{.yaml}
%YAML:1.0
iterationNr: 100
strings:
- "image1.jpg"
- Awesomeness
- "baboon.jpg"
Mapping:
One: 1
Two: 2
R: !!opencv-matrix
rows: 3
cols: 3
dt: u
data: [ 1, 0, 0, 0, 1, 0, 0, 0, 1 ]
T: !!opencv-matrix
rows: 3
cols: 1
dt: d
data: [ 0., 0., 0. ]
MyData:
A: 97
X: 3.1415926535897931e+000
id: mydata1234
@endcode
You may observe a runtime instance of this on the [YouTube
here](https://www.youtube.com/watch?v=A4yqVnByMMM) .
@youtube{A4yqVnByMMM}

View File

@@ -0,0 +1,227 @@
How to scan images, lookup tables and time measurement with OpenCV {#tutorial_how_to_scan_images}
==================================================================
@tableofcontents
@prev_tutorial{tutorial_mat_the_basic_image_container}
@next_tutorial{tutorial_mat_mask_operations}
| | |
| -: | :- |
| Original author | Bernát Gábor |
| Compatibility | OpenCV >= 3.0 |
Goal
----
We'll seek answers for the following questions:
- How to go through each and every pixel of an image?
- How are OpenCV matrix values stored?
- How to measure the performance of our algorithm?
- What are lookup tables and why use them?
Our test case
-------------
Let us consider a simple color reduction method. By using the unsigned char C and C++ type for
matrix item storing, a channel of pixel may have up to 256 different values. For a three channel
image this can allow the formation of way too many colors (16 million to be exact). Working with so
many color shades may give a heavy blow to our algorithm performance. However, sometimes it is
enough to work with a lot less of them to get the same final result.
In this cases it's common that we make a *color space reduction*. This means that we divide the
color space current value with a new input value to end up with fewer colors. For instance every
value between zero and nine takes the new value zero, every value between ten and nineteen the value
ten and so on.
When you divide an *uchar* (unsigned char - aka values between zero and 255) value with an *int*
value the result will be also *char*. These values may only be char values. Therefore, any fraction
will be rounded down. Taking advantage of this fact the upper operation in the *uchar* domain may be
expressed as:
\f[I_{new} = (\frac{I_{old}}{10}) * 10\f]
A simple color space reduction algorithm would consist of just passing through every pixel of an
image matrix and applying this formula. It's worth noting that we do a divide and a multiplication
operation. These operations are bloody expensive for a system. If possible it's worth avoiding them
by using cheaper operations such as a few subtractions, addition or in best case a simple
assignment. Furthermore, note that we only have a limited number of input values for the upper
operation. In case of the *uchar* system this is 256 to be exact.
Therefore, for larger images it would be wise to calculate all possible values beforehand and during
the assignment just make the assignment, by using a lookup table. Lookup tables are simple arrays
(having one or more dimensions) that for a given input value variation holds the final output value.
Its strength is that we do not need to make the calculation, we just need to read the result.
Our test case program (and the code sample below) will do the following: read in an image passed
as a command line argument (it may be either color or grayscale) and apply the reduction
with the given command line argument integer value. In OpenCV, at the moment there are
three major ways of going through an image pixel by pixel. To make things a little more interesting
we'll make the scanning of the image using each of these methods, and print out how long it took.
You can download the full source code [here
](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/how_to_scan_images/how_to_scan_images.cpp) or look it up in
the samples directory of OpenCV at the cpp tutorial code for the core section. Its basic usage is:
@code{.bash}
how_to_scan_images imageName.jpg intValueToReduce [G]
@endcode
The final argument is optional. If given the image will be loaded in grayscale format, otherwise
the BGR color space is used. The first thing is to calculate the lookup table.
@snippet how_to_scan_images.cpp dividewith
Here we first use the C++ *stringstream* class to convert the third command line argument from text
to an integer format. Then we use a simple look and the upper formula to calculate the lookup table.
No OpenCV specific stuff here.
Another issue is how do we measure time? Well OpenCV offers two simple functions to achieve this
cv::getTickCount() and cv::getTickFrequency() . The first returns the number of ticks of
your systems CPU from a certain event (like since you booted your system). The second returns how
many times your CPU emits a tick during a second. So, measuring amount of time elapsed between
two operations is as easy as:
@code{.cpp}
double t = (double)getTickCount();
// do something ...
t = ((double)getTickCount() - t)/getTickFrequency();
cout << "Times passed in seconds: " << t << endl;
@endcode
@anchor tutorial_how_to_scan_images_storing
How is the image matrix stored in memory?
-----------------------------------------
As you could already read in my @ref tutorial_mat_the_basic_image_container tutorial the size of the matrix
depends on the color system used. More accurately, it depends on the number of channels used. In
case of a grayscale image we have something like:
![](tutorial_how_matrix_stored_1.png)
For multichannel images the columns contain as many sub columns as the number of channels. For
example in case of an BGR color system:
![](tutorial_how_matrix_stored_2.png)
Note that the order of the channels is inverse: BGR instead of RGB. Because in many cases the memory
is large enough to store the rows in a successive fashion the rows may follow one after another,
creating a single long row. Because everything is in a single place following one after another this
may help to speed up the scanning process. We can use the cv::Mat::isContinuous() function to *ask*
the matrix if this is the case. Continue on to the next section to find an example.
The efficient way
-----------------
When it comes to performance you cannot beat the classic C style operator[] (pointer) access.
Therefore, the most efficient method we can recommend for making the assignment is:
@snippet how_to_scan_images.cpp scan-c
Here we basically just acquire a pointer to the start of each row and go through it until it ends.
In the special case that the matrix is stored in a continuous manner we only need to request the
pointer a single time and go all the way to the end. We need to look out for color images: we have
three channels so we need to pass through three times more items in each row.
There's another way of this. The *data* data member of a *Mat* object returns the pointer to the
first row, first column. If this pointer is null you have no valid input in that object. Checking
this is the simplest method to check if your image loading was a success. In case the storage is
continuous we can use this to go through the whole data pointer. In case of a grayscale image this
would look like:
@code{.cpp}
uchar* p = I.data;
for( unsigned int i = 0; i < ncol*nrows; ++i)
*p++ = table[*p];
@endcode
You would get the same result. However, this code is a lot harder to read later on. It gets even
harder if you have some more advanced technique there. Moreover, in practice I've observed you'll
get the same performance result (as most of the modern compilers will probably make this small
optimization trick automatically for you).
The iterator (safe) method
--------------------------
In case of the efficient way making sure that you pass through the right amount of *uchar* fields
and to skip the gaps that may occur between the rows was your responsibility. The iterator method is
considered a safer way as it takes over these tasks from the user. All you need to do is to ask the
begin and the end of the image matrix and then just increase the begin iterator until you reach the
end. To acquire the value *pointed* by the iterator use the \* operator (add it before it).
@snippet how_to_scan_images.cpp scan-iterator
In case of color images we have three uchar items per column. This may be considered a short vector
of uchar items, that has been baptized in OpenCV with the *Vec3b* name. To access the n-th sub
column we use simple operator[] access. It's important to remember that OpenCV iterators go through
the columns and automatically skip to the next row. Therefore in case of color images if you use a
simple *uchar* iterator you'll be able to access only the blue channel values.
On-the-fly address calculation with reference returning
-------------------------------------------------------
The final method isn't recommended for scanning. It was made to acquire or modify somehow random
elements in the image. Its basic usage is to specify the row and column number of the item you want
to access. During our earlier scanning methods you could already notice that it is important through
what type we are looking at the image. It's no different here as you need to manually specify what
type to use at the automatic lookup. You can observe this in case of the grayscale images for the
following source code (the usage of the + cv::Mat::at() function):
@snippet how_to_scan_images.cpp scan-random
The function takes your input type and coordinates and calculates the address of the
queried item. Then returns a reference to that. This may be a constant when you *get* the value and
non-constant when you *set* the value. As a safety step in **debug mode only**\* there is a check
performed that your input coordinates are valid and do exist. If this isn't the case you'll get a
nice output message of this on the standard error output stream. Compared to the efficient way in
release mode the only difference in using this is that for every element of the image you'll get a
new row pointer for what we use the C operator[] to acquire the column element.
If you need to do multiple lookups using this method for an image it may be troublesome and time
consuming to enter the type and the at keyword for each of the accesses. To solve this problem
OpenCV has a cv::Mat_ data type. It's the same as Mat with the extra need that at definition
you need to specify the data type through what to look at the data matrix, however in return you can
use the operator() for fast access of items. To make things even better this is easily convertible
from and to the usual cv::Mat data type. A sample usage of this you can see in case of the
color images of the function above. Nevertheless, it's important to note that the same operation
(with the same runtime speed) could have been done with the cv::Mat::at function. It's just a less
to write for the lazy programmer trick.
The Core Function
-----------------
This is a bonus method of achieving lookup table modification in an image. In image
processing it's quite common that you want to modify all of a given image values to some other value.
OpenCV provides a function for modifying image values, without the need to write the scanning logic
of the image. We use the cv::LUT() function of the core module. First we build a Mat type of the
lookup table:
@snippet how_to_scan_images.cpp table-init
Finally call the function (I is our input image and J the output one):
@snippet how_to_scan_images.cpp table-use
Performance Difference
----------------------
For the best result compile the program and run it yourself. To make the differences more
clear, I've used a quite large (2560 X 1600) image. The performance presented here are for
color images. For a more accurate value I've averaged the value I got from the call of the function
for hundred times.
Method | Time
--------------- | ----------------------
Efficient Way | 79.4717 milliseconds
Iterator | 83.7201 milliseconds
On-The-Fly RA | 93.7878 milliseconds
LUT function | 32.5759 milliseconds
We can conclude a couple of things. If possible, use the already made functions of OpenCV (instead
of reinventing these). The fastest method turns out to be the LUT function. This is because the OpenCV
library is multi-thread enabled via Intel Threaded Building Blocks. However, if you need to write a
simple image scan prefer the pointer method. The iterator is a safer bet, however quite slower.
Using the on-the-fly reference access method for full image scan is the most costly in debug mode.
In the release mode it may beat the iterator approach or not, however it surely sacrifices for this
the safety trait of iterators.
Finally, you may watch a sample run of the program on the [video posted](https://www.youtube.com/watch?v=fB3AN5fjgwc) on our YouTube channel.
@youtube{fB3AN5fjgwc}

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

View File

@@ -0,0 +1,196 @@
How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_}
==================================================================
@tableofcontents
@prev_tutorial{tutorial_file_input_output_with_xml_yml}
| | |
| -: | :- |
| Compatibility | OpenCV >= 3.0 |
Goal
----
The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set
exploiting almost all the CPU load available.
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended
to remain simple.
Precondition
----
The first precondition is to have OpenCV built with a parallel framework.
In OpenCV 3.2, the following parallel frameworks are available in that order:
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled)
3. OpenMP (integrated to compiler, should be explicitly enabled)
4. APPLE GCD (system wide, used automatically (APPLE only))
5. Windows RT concurrency (system wide, used automatically (Windows RT only))
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
7. Pthreads (if available)
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries
are third party libraries and have to be explicitly built and enabled in CMake (e.g. TBB, C=), others are
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.
The second (weak) precondition is more related to the task you want to achieve as not all computations
are suitable / can be adapted to be run in a parallel way. To remain simple, tasks that can be split
into multiple elementary operations with no memory dependency (no possible race condition) are easily
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of
one pixel does not depend to the state of other pixels.
Simple example: drawing a Mandelbrot set
----
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt
the code to parallelize the computation.
Theory
-----------
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set).
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration
> of the quadratic map
> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f]
> remains bounded.
> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying
> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets.
> This can also be represented as
> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f]
Pseudocode
-----------
A simple algorithm to generate a representation of the Mandelbrot set is called the
["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm).
For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not
under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas
we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will
produce a more detailed image but the computation time will increase accordingly. We use the number of iterations
needed to "escape" to depict the pixel value in the image.
```
For each pixel (Px, Py) on the screen, do:
{
x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1))
y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
x = 0.0
y = 0.0
iteration = 0
max_iteration = 1000
while (x*x + y*y < 2*2 AND iteration < max_iteration) {
xtemp = x*x - y*y + x0
y = 2*x*y + y0
x = xtemp
iteration = iteration + 1
}
color = palette[iteration]
plot(Px, Py, color)
}
```
To relate between the pseudocode and the theory, we have:
* \f$ z = x + iy \f$
* \f$ z^2 = x^2 + i2xy - y^2 \f$
* \f$ c = x_0 + iy_0 \f$
![](images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png)
On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis.
You can see that the whole shape can be repeatedly visible if we zoom at particular locations.
Implementation
-----------
Escape time algorithm implementation
--------------------------
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm
Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a
complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration.
Sequential Mandelbrot implementation
--------------------------
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential
In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the
pixel is likely to belong to the Mandelbrot set or not.
Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with:
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation
Finally, to assign the grayscale value to the pixels, we use the following rule:
* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set),
* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range.
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value
Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost
the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his
[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)):
\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$
![](images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png)
The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation
and you can observe how the lowest values will be boosted when looking at the slope at these positions.
Parallel Mandelbrot implementation
--------------------------
When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the
computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern
processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework.
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel
The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the
`virtual void operator ()(const cv::Range& range) const`.
The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread.
This splitting is done automatically to distribute equally the computation load. We have to convert the pixel index coordinate
to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place
the image.
The parallel execution is called with:
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image.
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)`
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the
workload only on two threads.
@note
C++ 11 standard allows to simplify the parallel implementation by get rid of the `ParallelMandelbrot` class and replacing it with lambda expression:
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call-cxx11
Results
-----------
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X.
Main reasons should be mostly due to:
* the overhead to create and manage the threads,
* background processes running in parallel,
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color
depending on the escaped iteration and using a color palette to get more aesthetic images):
![Mandelbrot set with xMin=-2.1, xMax=0.6, yMin=-1.2, yMax=1.2, maxIterations=500](images/how_to_use_OpenCV_parallel_for_Mandelbrot.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@@ -0,0 +1,201 @@
Mask operations on matrices {#tutorial_mat_mask_operations}
===========================
@tableofcontents
@prev_tutorial{tutorial_how_to_scan_images}
@next_tutorial{tutorial_mat_operations}
| | |
| -: | :- |
| Original author | Bernát Gábor |
| Compatibility | OpenCV >= 3.0 |
Mask operations on matrices are quite simple. The idea is that we recalculate each pixel's value in
an image according to a mask matrix (also known as kernel). This mask holds values that will adjust
how much influence neighboring pixels (and the current pixel) have on the new pixel value. From a
mathematical point of view we make a weighted average, with our specified values.
Our test case
-------------
Let's consider the issue of an image contrast enhancement method. Basically we want to apply for
every pixel of the image the following formula:
\f[I(i,j) = 5*I(i,j) - [ I(i-1,j) + I(i+1,j) + I(i,j-1) + I(i,j+1)]\f]\f[\iff I(i,j)*M, \text{where }
M = \bordermatrix{ _i\backslash ^j & -1 & 0 & +1 \cr
-1 & 0 & -1 & 0 \cr
0 & -1 & 5 & -1 \cr
+1 & 0 & -1 & 0 \cr
}\f]
The first notation is by using a formula, while the second is a compacted version of the first by
using a mask. You use the mask by putting the center of the mask matrix (in the upper case noted by
the zero-zero index) on the pixel you want to calculate and sum up the pixel values multiplied with
the overlapped matrix values. It's the same thing, however in case of large matrices the latter
notation is a lot easier to look over.
Code
----
@add_toggle_cpp
You can download this source code from [here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp) or look in the
OpenCV source code libraries sample directory at
`samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp`.
@include samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp
@end_toggle
@add_toggle_java
You can download this source code from [here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java) or look in the
OpenCV source code libraries sample directory at
`samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java`.
@include samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java
@end_toggle
@add_toggle_python
You can download this source code from [here
](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py) or look in the
OpenCV source code libraries sample directory at
`samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py`.
@include samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py
@end_toggle
The Basic Method
----------------
Now let us see how we can make this happen by using the basic pixel access method or by using the
**filter2D()** function.
Here's a function that will do this:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method
At first we make sure that the input images data is in unsigned char format. For this we use the
@ref cv::CV_Assert function that throws an error when the expression inside it is false.
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp 8_bit
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method
At first we make sure that the input images data in unsigned 8 bit format.
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java 8_bit
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method
At first we make sure that the input images data in unsigned 8 bit format.
@code{.py}
my_image = cv.cvtColor(my_image, cv.CV_8U)
@endcode
@end_toggle
We create an output image with the same size and the same type as our input. As you can see in the
@ref tutorial_how_to_scan_images_storing "storing" section, depending on the number of channels we may have one or more
subcolumns.
@add_toggle_cpp
We will iterate through them via pointers so the total number of elements depends on
this number.
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp create_channels
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java create_channels
@end_toggle
@add_toggle_python
@code{.py}
height, width, n_channels = my_image.shape
result = np.zeros(my_image.shape, my_image.dtype)
@endcode
@end_toggle
@add_toggle_cpp
We'll use the plain C [] operator to access pixels. Because we need to access multiple rows at the
same time we'll acquire the pointers for each of them (a previous, a current and a next line). We
need another pointer to where we're going to save the calculation. Then simply access the right
items with the [] operator. For moving the output pointer ahead we simply increase this (with one
byte) after each operation:
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp basic_method_loop
On the borders of the image the upper notation results inexistent pixel locations (like minus one -
minus one). In these points our formula is undefined. A simple solution is to not apply the kernel
in these points and, for example, set the pixels on the borders to zeros:
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp borders
@end_toggle
@add_toggle_java
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
Then we apply the sum and put the new value in the Result matrix.
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java basic_method_loop
On the borders of the image the upper notation results in inexistent pixel locations (like (-1,-1)).
In these points our formula is undefined. A simple solution is to not apply the kernel
in these points and, for example, set the pixels on the borders to zeros:
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java borders
@end_toggle
@add_toggle_python
We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j).
Then we apply the sum and put the new value in the Result matrix.
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py basic_method_loop
@end_toggle
The filter2D function
---------------------
Applying such filters are so common in image processing that in OpenCV there is a function that
will take care of applying the mask (also called a kernel in some places). For this you first need
to define an object that holds the mask:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp kern
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java kern
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py kern
@end_toggle
Then call the **filter2D()** function specifying the input, the output image and the kernel to
use:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp filter2D
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java filter2D
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py filter2D
@end_toggle
The function even has a fifth optional argument to specify the center of the kernel, a sixth
for adding an optional value to the filtered pixels before storing them in K and a seventh one
for determining what to do in the regions where the operation is undefined (borders).
This function is shorter, less verbose and, because there are some optimizations, it is usually faster
than the *hand-coded method*. For example in my test while the second one took only 13
milliseconds the first took around 31 milliseconds. Quite some difference.
For example:
![](images/resultMatMaskFilter2D.png)
@add_toggle_cpp
Check out an instance of running the program on our [YouTube
channel](http://www.youtube.com/watch?v=7PF1tAU9se4) .
@youtube{7PF1tAU9se4}
@end_toggle

View File

@@ -0,0 +1,270 @@
Operations with images {#tutorial_mat_operations}
======================
@tableofcontents
@prev_tutorial{tutorial_mat_mask_operations}
@next_tutorial{tutorial_adding_images}
| | |
| -: | :- |
| Compatibility | OpenCV >= 3.0 |
Input/Output
------------
### Images
Load an image from a file:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file
@end_toggle
If you read a jpg file, a 3 channel image is created by default. If you need a grayscale image, use:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Load an image from a file in grayscale
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Load an image from a file in grayscale
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Load an image from a file in grayscale
@end_toggle
@note Format of the file is determined by its content (first few bytes). To save an image to a file:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Save image
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Save image
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Save image
@end_toggle
@note Format of the file is determined by its extension.
@note Use cv::imdecode and cv::imencode to read and write an image from/to memory rather than a file.
Basic operations with images
----------------------------
### Accessing pixel intensity values
In order to get pixel intensity value, you have to know the type of an image and the number of
channels. Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates
x and y:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 1
@end_toggle
C++ version only:
intensity.val[0] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV
images are represented by the same structure as matrices, we use the same convention for both
cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or
x-coordinate) follows it. Alternatively, you can use the following notation (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 2
Now let us consider a 3 channel image with BGR color ordering (the default format returned by
imread):
**C++ code**
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 3
**Python Python**
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 3
You can use the same method for floating-point images (for example, you can get such an image by
running Sobel on a 3 channel image) (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 4
The same method can be used to change pixel intensities:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Pixel access 5
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Pixel access 5
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Pixel access 5
@end_toggle
There are functions in OpenCV, especially from calib3d module, such as cv::projectPoints, that take an
array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row
corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be
easily constructed from `std::vector` (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Mat from points vector
One can access a point in this matrix using the same method `Mat::at` (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Point access
### Memory management and reference counting
Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc)
and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to
the same data. A Mat keeps a reference count that tells if data has to be deallocated when a
particular instance of Mat is destroyed. Here is an example of creating two matrices without copying
data (**C++ only**):
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 1
As a result, we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. `pointsMat`
uses data from points and will not deallocate the memory when destroyed. In this particular
instance, however, developer has to make sure that lifetime of `points` is longer than of `pointsMat`
If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 2
@end_toggle
An empty output Mat can be supplied to each function.
Each implementation calls Mat::create for a destination matrix.
This method allocates data for a matrix if it is empty.
If it is not empty and has the correct size and type, the method does nothing.
If however, size or type are different from the input arguments, the data is deallocated (and lost) and a new data is allocated.
For example:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Reference counting 3
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Reference counting 3
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Reference counting 3
@end_toggle
### Primitive operations
There is a number of convenient operators defined on a matrix. For example, here is how we can make
a black image from an existing greyscale image `img`
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Set image to black
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Set image to black
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Set image to black
@end_toggle
Selecting a region of interest:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Select ROI
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Select ROI
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Select ROI
@end_toggle
Conversion from color to greyscale:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp BGR to Gray
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java BGR to Gray
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py BGR to Gray
@end_toggle
Change image type from 8UC1 to 32FC1:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp Convert to CV_32F
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java Convert to CV_32F
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py Convert to CV_32F
@end_toggle
### Visualizing images
It is very useful to see intermediate results of your algorithm during development process. OpenCV
provides a convenient way of visualizing images. A 8U image can be shown using:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 1
@end_toggle
A call to waitKey() starts a message passing cycle that waits for a key stroke in the "image"
window. A 32F image needs to be converted to 8U type. For example:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/core/mat_operations/mat_operations.cpp imshow 2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/core/mat_operations/MatOperations.java imshow 2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/core/mat_operations/mat_operations.py imshow 2
@end_toggle
@note Here cv::namedWindow is not necessary since it is immediately followed by cv::imshow.
Nevertheless, it can be used to change the window properties or when using cv::createTrackbar

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

View File

@@ -0,0 +1,278 @@
Mat - The Basic Image Container {#tutorial_mat_the_basic_image_container}
===============================
@tableofcontents
@next_tutorial{tutorial_how_to_scan_images}
| | |
| -: | :- |
| Original author | Bernát Gábor |
| Compatibility | OpenCV >= 3.0 |
Goal
----
We have multiple ways to acquire digital images from the real world: digital cameras, scanners,
computed tomography, and magnetic resonance imaging to name a few. In every case what we (humans)
see are images. However, when transforming this to our digital devices what we record are numerical
values for each of the points of the image.
![](images/MatBasicImageForComputer.jpg)
For example in the above image you can see that the mirror of the car is nothing more than a matrix
containing all the intensity values of the pixel points. How we get and store the pixels values may
vary according to our needs, but in the end all images inside a computer world may be reduced to
numerical matrices and other information describing the matrix itself. *OpenCV* is a computer vision
library whose main focus is to process and manipulate this information. Therefore, the first thing
you need to be familiar with is how OpenCV stores and handles images.
Mat
---
OpenCV has been around since 2001. In those days the library was built around a *C* interface and to
store the image in the memory they used a C structure called *IplImage*. This is the one you'll see
in most of the older tutorials and educational materials. The problem with this is that it brings to
the table all the minuses of the C language. The biggest issue is the manual memory management. It
builds on the assumption that the user is responsible for taking care of memory allocation and
deallocation. While this is not a problem with smaller programs, once your code base grows it will
be more of a struggle to handle all this rather than focusing on solving your development goal.
Luckily C++ came around and introduced the concept of classes making easier for the user through
automatic memory management (more or less). The good news is that C++ is fully compatible with C so
no compatibility issues can arise from making the change. Therefore, OpenCV 2.0 introduced a new C++
interface which offered a new way of doing things which means you do not need to fiddle with memory
management, making your code concise (less to write, to achieve more). The main downside of the C++
interface is that many embedded development systems at the moment support only C. Therefore, unless
you are targeting embedded platforms, there's no point to using the *old* methods (unless you're a
masochist programmer and you're asking for trouble).
The first thing you need to know about *Mat* is that you no longer need to manually allocate its
memory and release it as soon as you do not need it. While doing this is still a possibility, most
of the OpenCV functions will allocate its output data automatically. As a nice bonus if you pass on
an already existing *Mat* object, which has already allocated the required space for the matrix,
this will be reused. In other words we use at all times only as much memory as we need to perform
the task.
*Mat* is basically a class with two data parts: the matrix header (containing information such as
the size of the matrix, the method used for storing, at which address is the matrix stored, and so
on) and a pointer to the matrix containing the pixel values (taking any dimensionality depending on
the method chosen for storing) . The matrix header size is constant, however the size of the matrix
itself may vary from image to image and usually is larger by orders of magnitude.
OpenCV is an image processing library. It contains a large collection of image processing functions.
To solve a computational challenge, most of the time you will end up using multiple functions of the
library. Because of this, passing images to functions is a common practice. We should not forget
that we are talking about image processing algorithms, which tend to be quite computational heavy.
The last thing we want to do is further decrease the speed of your program by making unnecessary
copies of potentially *large* images.
To tackle this issue OpenCV uses a reference counting system. The idea is that each *Mat* object has
its own header, however a matrix may be shared between two *Mat* objects by having their matrix
pointers point to the same address. Moreover, the copy operators **will only copy the headers** and
the pointer to the large matrix, not the data itself.
@code{.cpp}
Mat A, C; // creates just the header parts
A = imread(argv[1], IMREAD_COLOR); // here we'll know the method used (allocate matrix)
Mat B(A); // Use the copy constructor
C = A; // Assignment operator
@endcode
All the above objects, in the end, point to the same single data matrix and making a modification
using any of them will affect all the other ones as well. In practice the different objects just
provide different access methods to the same underlying data. Nevertheless, their header parts are
different. The real interesting part is that you can create headers which refer to only a subsection
of the full data. For example, to create a region of interest (*ROI*) in an image you just create
a new header with the new boundaries:
@code{.cpp}
Mat D (A, Rect(10, 10, 100, 100) ); // using a rectangle
Mat E = A(Range::all(), Range(1,3)); // using row and column boundaries
@endcode
Now you may ask -- if the matrix itself may belong to multiple *Mat* objects who takes responsibility
for cleaning it up when it's no longer needed. The short answer is: the last object that used it.
This is handled by using a reference counting mechanism. Whenever somebody copies a header of a
*Mat* object, a counter is increased for the matrix. Whenever a header is cleaned, this counter
is decreased. When the counter reaches zero the matrix is freed. Sometimes you will want to copy
the matrix itself too, so OpenCV provides @ref cv::Mat::clone() and @ref cv::Mat::copyTo() functions.
@code{.cpp}
Mat F = A.clone();
Mat G;
A.copyTo(G);
@endcode
Now modifying *F* or *G* will not affect the matrix pointed by the *A*'s header. What you need to
remember from all this is that:
- Output image allocation for OpenCV functions is automatic (unless specified otherwise).
- You do not need to think about memory management with OpenCV's C++ interface.
- The assignment operator and the copy constructor only copies the header.
- The underlying matrix of an image may be copied using the @ref cv::Mat::clone() and @ref cv::Mat::copyTo()
functions.
Storing methods
-----------------
This is about how you store the pixel values. You can select the color space and the data type used.
The color space refers to how we combine color components in order to code a given color. The
simplest one is the grayscale where the colors at our disposal are black and white. The combination
of these allows us to create many shades of gray.
For *colorful* ways we have a lot more methods to choose from. Each of them breaks it down to three
or four basic components and we can use the combination of these to create the others. The most
popular one is RGB, mainly because this is also how our eye builds up colors. Its base colors are
red, green and blue. To code the transparency of a color sometimes a fourth element: alpha (A) is
added.
There are, however, many other color systems each with their own advantages:
- RGB is the most common as our eyes use something similar, however keep in mind that OpenCV standard display
system composes colors using the BGR color space (red and blue channels are swapped places).
- The HSV and HLS decompose colors into their hue, saturation and value/luminance components,
which is a more natural way for us to describe colors. You might, for example, dismiss the last
component, making your algorithm less sensible to the light conditions of the input image.
- YCrCb is used by the popular JPEG image format.
- CIE L\*a\*b\* is a perceptually uniform color space, which comes in handy if you need to measure
the *distance* of a given color to another color.
Each of the building components has its own valid domains. This leads to the data type used. How
we store a component defines the control we have over its domain. The smallest data type possible is
*char*, which means one byte or 8 bits. This may be unsigned (so can store values from 0 to 255) or
signed (values from -127 to +127). Although in case of three components this already gives 16
million possible colors to represent (like in case of RGB) we may acquire an even finer control by
using the float (4 byte = 32 bit) or double (8 byte = 64 bit) data types for each component.
Nevertheless, remember that increasing the size of a component also increases the size of the whole
picture in the memory.
Creating a Mat object explicitly
----------------------------------
In the @ref tutorial_load_save_image tutorial you have already learned how to write a matrix to an image
file by using the @ref cv::imwrite() function. However, for debugging purposes it's much more
convenient to see the actual values. You can do this using the \<\< operator of *Mat*. Be aware that
this only works for two dimensional matrices.
Although *Mat* works really well as an image container, it is also a general matrix class.
Therefore, it is possible to create and manipulate multidimensional matrices. You can create a Mat
object in multiple ways:
- @ref cv::Mat::Mat Constructor
@snippet mat_the_basic_image_container.cpp constructor
![](images/MatBasicContainerOut1.png)
For two dimensional and multichannel images we first define their size: row and column count wise.
Then we need to specify the data type to use for storing the elements and the number of channels
per matrix point. To do this we have multiple definitions constructed according to the following
convention:
@code
CV_[The number of bits per item][Signed or Unsigned][Type Prefix]C[The channel number]
@endcode
For instance, *CV_8UC3* means we use unsigned char types that are 8 bit long and each pixel has
three of these to form the three channels. There are types predefined for up to four channels. The
@ref cv::Scalar is four element short vector. Specify it and you can initialize all matrix
points with a custom value. If you need more you can create the type with the upper macro, setting
the channel number in parenthesis as you can see below.
- Use C/C++ arrays and initialize via constructor
@snippet mat_the_basic_image_container.cpp init
The upper example shows how to create a matrix with more than two dimensions. Specify its
dimension, then pass a pointer containing the size for each dimension and the rest remains the
same.
- @ref cv::Mat::create function:
@snippet mat_the_basic_image_container.cpp create
![](images/MatBasicContainerOut2.png)
You cannot initialize the matrix values with this construction. It will only reallocate its matrix
data memory if the new size will not fit into the old one.
- MATLAB style initializer: @ref cv::Mat::zeros , @ref cv::Mat::ones , @ref cv::Mat::eye . Specify size and
data type to use:
@snippet mat_the_basic_image_container.cpp matlab
![](images/MatBasicContainerOut3.png)
- For small matrices you may use comma separated initializers or initializer lists (C++11 support is required in the last case):
@snippet mat_the_basic_image_container.cpp comma
@snippet mat_the_basic_image_container.cpp list
![](images/MatBasicContainerOut6.png)
- Create a new header for an existing *Mat* object and @ref cv::Mat::clone or @ref cv::Mat::copyTo it.
@snippet mat_the_basic_image_container.cpp clone
![](images/MatBasicContainerOut7.png)
@note
You can fill out a matrix with random values using the @ref cv::randu() function. You need to
give a lower and upper limit for the random values:
@snippet mat_the_basic_image_container.cpp random
Output formatting
-----------------
In the above examples you could see the default formatting option. OpenCV, however, allows you to
format your matrix output:
- Default
@snippet mat_the_basic_image_container.cpp out-default
![](images/MatBasicContainerOut8.png)
- Python
@snippet mat_the_basic_image_container.cpp out-python
![](images/MatBasicContainerOut16.png)
- Comma separated values (CSV)
@snippet mat_the_basic_image_container.cpp out-csv
![](images/MatBasicContainerOut10.png)
- Numpy
@snippet mat_the_basic_image_container.cpp out-numpy
![](images/MatBasicContainerOut9.png)
- C
@snippet mat_the_basic_image_container.cpp out-c
![](images/MatBasicContainerOut11.png)
Output of other common items
----------------------------
OpenCV offers support for output of other common OpenCV data structures too via the \<\< operator:
- 2D Point
@snippet mat_the_basic_image_container.cpp out-point2
![](images/MatBasicContainerOut12.png)
- 3D Point
@snippet mat_the_basic_image_container.cpp out-point3
![](images/MatBasicContainerOut13.png)
- std::vector via cv::Mat
@snippet mat_the_basic_image_container.cpp out-vector
![](images/MatBasicContainerOut14.png)
- std::vector of points
@snippet mat_the_basic_image_container.cpp out-vector-points
![](images/MatBasicContainerOut15.png)
Most of the samples here have been included in a small console application. You can download it from
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/core/mat_the_basic_image_container/mat_the_basic_image_container.cpp)
or in the core section of the cpp samples.
You can also find a quick video demonstration of this on
[YouTube](https://www.youtube.com/watch?v=1tibU7vGWpk).
@youtube{1tibU7vGWpk}

View File

@@ -0,0 +1,12 @@
The Core Functionality (core module) {#tutorial_table_of_content_core}
=====================================
- @subpage tutorial_mat_the_basic_image_container
- @subpage tutorial_how_to_scan_images
- @subpage tutorial_mat_mask_operations
- @subpage tutorial_mat_operations
- @subpage tutorial_adding_images
- @subpage tutorial_basic_linear_transform
- @subpage tutorial_discrete_fourier_transform
- @subpage tutorial_file_input_output_with_xml_yml
- @subpage tutorial_how_to_use_OpenCV_parallel_for_