init - 初始化项目

This commit is contained in:
Lee Nony
2022-05-06 01:58:53 +08:00
commit 90a5cc7cb6
6772 changed files with 2837787 additions and 0 deletions

View File

@@ -0,0 +1,4 @@
Machine Learning (ml module) {#tutorial_table_of_content_ml}
============================
Content has been moved to this page: @ref tutorial_table_of_content_other

View File

@@ -0,0 +1,4 @@
Object Detection (objdetect module) {#tutorial_table_of_content_objdetect}
===================================
Content has been moved to this page: @ref tutorial_table_of_content_other

View File

@@ -0,0 +1,4 @@
Computational photography (photo module) {#tutorial_table_of_content_photo}
========================================
Content has been moved to this page: @ref tutorial_table_of_content_other

View File

@@ -0,0 +1,4 @@
Images stitching (stitching module) {#tutorial_table_of_content_stitching}
===================================
Content has been moved to this page: @ref tutorial_table_of_content_other

View File

@@ -0,0 +1,4 @@
Video analysis (video module) {#tutorial_table_of_content_video}
=============================
Content has been moved to this page: @ref tutorial_table_of_content_other

View File

@@ -0,0 +1,176 @@
How to Use Background Subtraction Methods {#tutorial_background_subtraction}
=========================================
@tableofcontents
@prev_tutorial{tutorial_stitcher}
@next_tutorial{tutorial_meanshift}
| | |
| -: | :- |
| Original author | Domenico Daniele Bloisi |
| Compatibility | OpenCV >= 3.0 |
- Background subtraction (BS) is a common and widely used technique for generating a foreground
mask (namely, a binary image containing the pixels belonging to moving objects in the scene) by
using static cameras.
- As the name suggests, BS calculates the foreground mask performing a subtraction between the
current frame and a background model, containing the static part of the scene or, more in
general, everything that can be considered as background given the characteristics of the
observed scene.
![](images/Background_Subtraction_Tutorial_Scheme.png)
- Background modeling consists of two main steps:
-# Background Initialization;
-# Background Update.
In the first step, an initial model of the background is computed, while in the second step that
model is updated in order to adapt to possible changes in the scene.
- In this tutorial we will learn how to perform BS by using OpenCV.
Goals
-----
In this tutorial you will learn how to:
-# Read data from videos or image sequences by using @ref cv::VideoCapture ;
-# Create and update the background model by using @ref cv::BackgroundSubtractor class;
-# Get and show the foreground mask by using @ref cv::imshow ;
### Code
In the following you can find the source code. We will let the user choose to process either a video
file or a sequence of images.
We will use @ref cv::BackgroundSubtractorMOG2 in this sample, to generate the foreground mask.
The results as well as the input data are shown on the screen.
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/video/bg_sub.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/video/bg_sub.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/video/background_subtraction/bg_sub.py)
- **Code at glance:**
@include samples/python/tutorial_code/video/background_subtraction/bg_sub.py
@end_toggle
Explanation
-----------
We discuss the main parts of the code above:
- A @ref cv::BackgroundSubtractor object will be used to generate the foreground mask. In this
example, default parameters are used, but it is also possible to declare specific parameters in
the create function.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/video/bg_sub.cpp create
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java create
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/video/background_subtraction/bg_sub.py create
@end_toggle
- A @ref cv::VideoCapture object is used to read the input video or input images sequence.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/video/bg_sub.cpp capture
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java capture
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/video/background_subtraction/bg_sub.py capture
@end_toggle
- Every frame is used both for calculating the foreground mask and for updating the background. If
you want to change the learning rate used for updating the background model, it is possible to
set a specific learning rate by passing a parameter to the `apply` method.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/video/bg_sub.cpp apply
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java apply
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/video/background_subtraction/bg_sub.py apply
@end_toggle
- The current frame number can be extracted from the @ref cv::VideoCapture object and stamped in
the top left corner of the current frame. A white rectangle is used to highlight the black
colored frame number.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/video/bg_sub.cpp display_frame_number
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java display_frame_number
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/video/background_subtraction/bg_sub.py display_frame_number
@end_toggle
- We are ready to show the current input frame and the results.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/video/bg_sub.cpp show
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java show
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/video/background_subtraction/bg_sub.py show
@end_toggle
Results
-------
- With the `vtest.avi` video, for the following frame:
![](images/Background_Subtraction_Tutorial_frame.jpg)
The output of the program will look as the following for MOG2 method (gray areas are detected shadows):
![](images/Background_Subtraction_Tutorial_result_MOG2.jpg)
The output of the program will look as the following for the KNN method (gray areas are detected shadows):
![](images/Background_Subtraction_Tutorial_result_KNN.jpg)
References
----------
- [Background Models Challenge (BMC) website](https://web.archive.org/web/20140418093037/http://bmc.univ-bpclermont.fr/)
- A Benchmark Dataset for Foreground/Background Extraction @cite vacavant2013benchmark

View File

@@ -0,0 +1,148 @@
Cascade Classifier {#tutorial_cascade_classifier}
==================
@tableofcontents
@prev_tutorial{tutorial_optical_flow}
@next_tutorial{tutorial_traincascade}
| | |
| -: | :- |
| Original author | Ana Huamán |
| Compatibility | OpenCV >= 3.0 |
Goal
----
In this tutorial,
- We will learn how the Haar cascade object detection works.
- We will see the basics of face detection and eye detection using the Haar Feature-based Cascade Classifiers
- We will use the @ref cv::CascadeClassifier class to detect objects in a video stream. Particularly, we
will use the functions:
- @ref cv::CascadeClassifier::load to load a .xml classifier file. It can be either a Haar or a LBP classifier
- @ref cv::CascadeClassifier::detectMultiScale to perform the detection.
Theory
------
Object Detection using Haar feature-based cascade classifiers is an effective object detection
method proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a
Boosted Cascade of Simple Features" in 2001. It is a machine learning based approach where a cascade
function is trained from a lot of positive and negative images. It is then used to detect objects in
other images.
Here we will work with face detection. Initially, the algorithm needs a lot of positive images
(images of faces) and negative images (images without faces) to train the classifier. Then we need
to extract features from it. For this, Haar features shown in the below image are used. They are just
like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels
under the white rectangle from sum of pixels under the black rectangle.
![image](images/haar_features.jpg)
Now, all possible sizes and locations of each kernel are used to calculate lots of features. (Just
imagine how much computation it needs? Even a 24x24 window results over 160000 features). For each
feature calculation, we need to find the sum of the pixels under white and black rectangles. To solve
this, they introduced the integral image. However large your image, it reduces the calculations for a
given pixel to an operation involving just four pixels. Nice, isn't it? It makes things super-fast.
But among all these features we calculated, most of them are irrelevant. For example, consider the
image below. The top row shows two good features. The first feature selected seems to focus on the
property that the region of the eyes is often darker than the region of the nose and cheeks. The
second feature selected relies on the property that the eyes are darker than the bridge of the nose.
But the same windows applied to cheeks or any other place is irrelevant. So how do we select the
best features out of 160000+ features? It is achieved by **Adaboost**.
![image](images/haar.png)
For this, we apply each and every feature on all the training images. For each feature, it finds the
best threshold which will classify the faces to positive and negative. Obviously, there will be
errors or misclassifications. We select the features with minimum error rate, which means they are
the features that most accurately classify the face and non-face images. (The process is not as simple as
this. Each image is given an equal weight in the beginning. After each classification, weights of
misclassified images are increased. Then the same process is done. New error rates are calculated.
Also new weights. The process is continued until the required accuracy or error rate is achieved or
the required number of features are found).
The final classifier is a weighted sum of these weak classifiers. It is called weak because it alone
can't classify the image, but together with others forms a strong classifier. The paper says even
200 features provide detection with 95% accuracy. Their final setup had around 6000 features.
(Imagine a reduction from 160000+ features to 6000 features. That is a big gain).
So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is face or
not. Wow.. Isn't it a little inefficient and time consuming? Yes, it is. The authors have a good
solution for that.
In an image, most of the image is non-face region. So it is a better idea to have a simple
method to check if a window is not a face region. If it is not, discard it in a single shot, and don't
process it again. Instead, focus on regions where there can be a face. This way, we spend more time
checking possible face regions.
For this they introduced the concept of **Cascade of Classifiers**. Instead of applying all 6000
features on a window, the features are grouped into different stages of classifiers and applied one-by-one.
(Normally the first few stages will contain very many fewer features). If a window fails the first
stage, discard it. We don't consider the remaining features on it. If it passes, apply the second stage
of features and continue the process. The window which passes all stages is a face region. How is
that plan!
The authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in the first five
stages. (The two features in the above image are actually obtained as the best two features from
Adaboost). According to the authors, on average 10 features out of 6000+ are evaluated per
sub-window.
So this is a simple intuitive explanation of how Viola-Jones face detection works. Read the paper for
more details or check out the references in the Additional Resources section.
Haar-cascade Detection in OpenCV
--------------------------------
OpenCV provides a training method (see @ref tutorial_traincascade) or pretrained models, that can be read using the @ref cv::CascadeClassifier::load method.
The pretrained models are located in the data folder in the OpenCV installation or can be found [here](https://github.com/opencv/opencv/tree/master/data).
The following code example will use pretrained Haar cascade models to detect faces and eyes in an image.
First, a @ref cv::CascadeClassifier is created and the necessary XML file is loaded using the @ref cv::CascadeClassifier::load method.
Afterwards, the detection is done using the @ref cv::CascadeClassifier::detectMultiScale method, which returns boundary rectangles for the detected faces or eyes.
@add_toggle_cpp
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/objectDetection/objectDetection.cpp)
@include samples/cpp/tutorial_code/objectDetection/objectDetection.cpp
@end_toggle
@add_toggle_java
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/objectDetection/cascade_classifier/ObjectDetectionDemo.java)
@include samples/java/tutorial_code/objectDetection/cascade_classifier/ObjectDetectionDemo.java
@end_toggle
@add_toggle_python
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/objectDetection/cascade_classifier/objectDetection.py)
@include samples/python/tutorial_code/objectDetection/cascade_classifier/objectDetection.py
@end_toggle
Result
------
-# Here is the result of running the code above and using as input the video stream of a built-in
webcam:
![](images/Cascade_Classifier_Tutorial_Result_Haar.jpg)
Be sure the program will find the path of files *haarcascade_frontalface_alt.xml* and
*haarcascade_eye_tree_eyeglasses.xml*. They are located in
*opencv/data/haarcascades*
-# This is the result of using the file *lbpcascade_frontalface.xml* (LBP trained) for the face
detection. For the eyes we keep using the file used in the tutorial.
![](images/Cascade_Classifier_Tutorial_Result_LBP.jpg)
Additional Resources
--------------------
-# Paul Viola and Michael J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137154, 2004. @cite Viola04
-# Rainer Lienhart and Jochen Maydt. An extended set of haar-like features for rapid object detection. In Image Processing. 2002. Proceedings. 2002 International Conference on, volume 1, pages I900. IEEE, 2002. @cite Lienhart02
-# Video Lecture on [Face Detection and Tracking](https://www.youtube.com/watch?v=WfdYYNamHZ8)
-# An interesting interview regarding Face Detection by [Adam
Harvey](https://web.archive.org/web/20171204220159/http://www.makematics.com/research/viola-jones/)
-# [OpenCV Face Detection: Visualized](https://vimeo.com/12774628) on Vimeo by Adam Harvey

View File

@@ -0,0 +1,204 @@
High Dynamic Range Imaging {#tutorial_hdr_imaging}
==========================
@tableofcontents
@next_tutorial{tutorial_stitcher}
| | |
| -: | :- |
| Original author | Fedor Morozov |
| Compatibility | OpenCV >= 3.0 |
Introduction
------------
Today most digital images and imaging devices use 8 bits per channel thus limiting the dynamic range
of the device to two orders of magnitude (actually 256 levels), while human eye can adapt to
lighting conditions varying by ten orders of magnitude. When we take photographs of a real world
scene bright regions may be overexposed, while the dark ones may be underexposed, so we cant
capture all details using a single exposure. HDR imaging works with images that use more that 8 bits
per channel (usually 32-bit float values), allowing much wider dynamic range.
There are different ways to obtain HDR images, but the most common one is to use photographs of the
scene taken with different exposure values. To combine this exposures it is useful to know your
cameras response function and there are algorithms to estimate it. After the HDR image has been
blended it has to be converted back to 8-bit to view it on usual displays. This process is called
tonemapping. Additional complexities arise when objects of the scene or camera move between shots,
since images with different exposures should be registered and aligned.
In this tutorial we show how to generate and display HDR image from an exposure sequence. In our
case images are already aligned and there are no moving objects. We also demonstrate an alternative
approach called exposure fusion that produces low dynamic range image. Each step of HDR pipeline can
be implemented using different algorithms so take a look at the reference manual to see them all.
Exposure sequence
-----------------
![](images/memorial.png)
Source Code
-----------
@add_toggle_cpp
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp)
@include samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp
@end_toggle
@add_toggle_java
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java)
@include samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java
@end_toggle
@add_toggle_python
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py)
@include samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py
@end_toggle
Sample images
-------------
Data directory that contains images, exposure times and `list.txt` file can be downloaded from
[here](https://github.com/opencv/opencv_extra/tree/master/testdata/cv/hdr/exposures).
Explanation
-----------
- **Load images and exposure times**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp Load images and exposure times
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java Load images and exposure times
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py Load images and exposure times
@end_toggle
Firstly we load input images and exposure times from user-defined folder. The folder should
contain images and *list.txt* - file that contains file names and inverse exposure times.
For our image sequence the list is following:
@code{.none}
memorial00.png 0.03125
memorial01.png 0.0625
...
memorial15.png 1024
@endcode
- **Estimate camera response**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp Estimate camera response
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java Estimate camera response
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py Estimate camera response
@end_toggle
It is necessary to know camera response function (CRF) for a lot of HDR construction algorithms.
We use one of the calibration algorithms to estimate inverse CRF for all 256 pixel values.
- **Make HDR image**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp Make HDR image
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java Make HDR image
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py Make HDR image
@end_toggle
We use Debevec's weighting scheme to construct HDR image using response calculated in the previous
item.
- **Tonemap HDR image**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp Tonemap HDR image
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java Tonemap HDR image
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py Tonemap HDR image
@end_toggle
Since we want to see our results on common LDR display we have to map our HDR image to 8-bit range
preserving most details. It is the main goal of tonemapping methods. We use tonemapper with
bilateral filtering and set 2.2 as the value for gamma correction.
- **Perform exposure fusion**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp Perform exposure fusion
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java Perform exposure fusion
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py Perform exposure fusion
@end_toggle
There is an alternative way to merge our exposures in case when we don't need HDR image. This
process is called exposure fusion and produces LDR image that doesn't require gamma correction. It
also doesn't use exposure values of the photographs.
- **Write results**
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp Write results
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java Write results
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py Write results
@end_toggle
Now it's time to look at the results. Note that HDR image can't be stored in one of common image
formats, so we save it to Radiance image (.hdr). Also all HDR imaging functions return results in
[0, 1] range so we should multiply result by 255.
You can try other tonemap algorithms: cv::TonemapDrago, cv::TonemapMantiuk and cv::TonemapReinhard
You can also adjust the parameters in the HDR calibration and tonemap methods for your own photos.
Results
-------
### Tonemapped image
![](images/ldr.png)
### Exposure fusion
![](images/fusion.png)
Additional Resources
--------------------
1. Paul E Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes, page 31. ACM, 2008. @cite DM97
2. Mark A Robertson, Sean Borman, and Robert L Stevenson. Dynamic range improvement through multiple exposures. In Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on, volume 3, pages 159163. IEEE, 1999. @cite RB99
3. Tom Mertens, Jan Kautz, and Frank Van Reeth. Exposure fusion. In Computer Graphics and Applications, 2007. PG'07. 15th Pacific Conference on, pages 382390. IEEE, 2007. @cite MK07
4. [Wikipedia-HDR](https://en.wikipedia.org/wiki/High-dynamic-range_imaging)
5. [Recovering High Dynamic Range Radiance Maps from Photographs (webpage)](http://www.pauldebevec.com/Research/HDR/)

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 155 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 687 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 560 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 490 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 203 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 278 KiB

View File

@@ -0,0 +1,217 @@
Introduction to Principal Component Analysis (PCA) {#tutorial_introduction_to_pca}
=======================================
@tableofcontents
@prev_tutorial{tutorial_non_linear_svms}
| | |
| -: | :- |
| Original author | Theodore Tsesmelis |
| Compatibility | OpenCV >= 3.0 |
Goal
----
In this tutorial you will learn how to:
- Use the OpenCV class @ref cv::PCA to calculate the orientation of an object.
What is PCA?
--------------
Principal Component Analysis (PCA) is a statistical procedure that extracts the most important features of a dataset.
![](images/pca_line.png)
Consider that you have a set of 2D points as it is shown in the figure above. Each dimension corresponds to a feature you are interested in. Here some could argue that the points are set in a random order. However, if you have a better look you will see that there is a linear pattern (indicated by the blue line) which is hard to dismiss. A key point of PCA is the Dimensionality Reduction. Dimensionality Reduction is the process of reducing the number of the dimensions of the given dataset. For example, in the above case it is possible to approximate the set of points to a single line and therefore, reduce the dimensionality of the given points from 2D to 1D.
Moreover, you could also see that the points vary the most along the blue line, more than they vary along the Feature 1 or Feature 2 axes. This means that if you know the position of a point along the blue line you have more information about the point than if you only knew where it was on Feature 1 axis or Feature 2 axis.
Hence, PCA allows us to find the direction along which our data varies the most. In fact, the result of running PCA on the set of points in the diagram consist of 2 vectors called _eigenvectors_ which are the _principal components_ of the data set.
![](images/pca_eigen.png)
The size of each eigenvector is encoded in the corresponding eigenvalue and indicates how much the data vary along the principal component. The beginning of the eigenvectors is the center of all points in the data set. Applying PCA to N-dimensional data set yields N N-dimensional eigenvectors, N eigenvalues and 1 N-dimensional center point. Enough theory, lets see how we can put these ideas into code.
How are the eigenvectors and eigenvalues computed?
--------------------------------------------------
The goal is to transform a given data set __X__ of dimension _p_ to an alternative data set __Y__ of smaller dimension _L_. Equivalently, we are seeking to find the matrix __Y__, where __Y__ is the _KarhunenLoève transform_ (KLT) of matrix __X__:
\f[ \mathbf{Y} = \mathbb{K} \mathbb{L} \mathbb{T} \{\mathbf{X}\} \f]
__Organize the data set__
Suppose you have data comprising a set of observations of _p_ variables, and you want to reduce the data so that each observation can be described with only _L_ variables, _L_ < _p_. Suppose further, that the data are arranged as a set of _n_ data vectors \f$ x_1...x_n \f$ with each \f$ x_i \f$ representing a single grouped observation of the _p_ variables.
- Write \f$ x_1...x_n \f$ as row vectors, each of which has _p_ columns.
- Place the row vectors into a single matrix __X__ of dimensions \f$ n\times p \f$.
__Calculate the empirical mean__
- Find the empirical mean along each dimension \f$ j = 1, ..., p \f$.
- Place the calculated mean values into an empirical mean vector __u__ of dimensions \f$ p\times 1 \f$.
\f[ \mathbf{u[j]} = \frac{1}{n}\sum_{i=1}^{n}\mathbf{X[i,j]} \f]
__Calculate the deviations from the mean__
Mean subtraction is an integral part of the solution towards finding a principal component basis that minimizes the mean square error of approximating the data. Hence, we proceed by centering the data as follows:
- Subtract the empirical mean vector __u__ from each row of the data matrix __X__.
- Store mean-subtracted data in the \f$ n\times p \f$ matrix __B__.
\f[ \mathbf{B} = \mathbf{X} - \mathbf{h}\mathbf{u^{T}} \f]
where __h__ is an \f$ n\times 1 \f$ column vector of all 1s:
\f[ h[i] = 1, i = 1, ..., n \f]
__Find the covariance matrix__
- Find the \f$ p\times p \f$ empirical covariance matrix __C__ from the outer product of matrix __B__ with itself:
\f[ \mathbf{C} = \frac{1}{n-1} \mathbf{B^{*}} \cdot \mathbf{B} \f]
where * is the conjugate transpose operator. Note that if B consists entirely of real numbers, which is the case in many applications, the "conjugate transpose" is the same as the regular transpose.
__Find the eigenvectors and eigenvalues of the covariance matrix__
- Compute the matrix __V__ of eigenvectors which diagonalizes the covariance matrix __C__:
\f[ \mathbf{V^{-1}} \mathbf{C} \mathbf{V} = \mathbf{D} \f]
where __D__ is the diagonal matrix of eigenvalues of __C__.
- Matrix __D__ will take the form of an \f$ p \times p \f$ diagonal matrix:
\f[ D[k,l] = \left\{\begin{matrix} \lambda_k, k = l \\ 0, k \neq l \end{matrix}\right. \f]
here, \f$ \lambda_j \f$ is the _j_-th eigenvalue of the covariance matrix __C__
- Matrix __V__, also of dimension _p_ x _p_, contains _p_ column vectors, each of length _p_, which represent the _p_ eigenvectors of the covariance matrix __C__.
- The eigenvalues and eigenvectors are ordered and paired. The _j_ th eigenvalue corresponds to the _j_ th eigenvector.
@note sources [[1]](https://robospace.wordpress.com/2013/10/09/object-orientation-principal-component-analysis-opencv/), [[2]](http://en.wikipedia.org/wiki/Principal_component_analysis) and special thanks to Svetlin Penkov for the original tutorial.
Source Code
-----------
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py)
- **Code at glance:**
@include samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py
@end_toggle
@note Another example using PCA for dimensionality reduction while maintaining an amount of variance can be found at [opencv_source_code/samples/cpp/pca.cpp](https://github.com/opencv/opencv/tree/master/samples/cpp/pca.cpp)
Explanation
-----------
- __Read image and convert it to binary__
Here we apply the necessary pre-processing procedures in order to be able to detect the objects of interest.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp pre-process
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java pre-process
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py pre-process
@end_toggle
- __Extract objects of interest__
Then find and filter contours by size and obtain the orientation of the remaining ones.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp contours
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java contours
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py contours
@end_toggle
- __Extract orientation__
Orientation is extracted by the call of getOrientation() function, which performs all the PCA procedure.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp pca
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java pca
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py pca
@end_toggle
First the data need to be arranged in a matrix with size n x 2, where n is the number of data points we have. Then we can perform that PCA analysis. The calculated mean (i.e. center of mass) is stored in the _cntr_ variable and the eigenvectors and eigenvalues are stored in the corresponding std::vectors.
- __Visualize result__
The final result is visualized through the drawAxis() function, where the principal components are drawn in lines, and each eigenvector is multiplied by its eigenvalue and translated to the mean position.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp visualization
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java visualization
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py visualization
@end_toggle
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp visualization1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java visualization1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py visualization1
@end_toggle
Results
-------
The code opens an image, finds the orientation of the detected objects of interest and then visualizes the result by drawing the contours of the detected objects of interest, the center point, and the x-axis, y-axis regarding the extracted orientation.
![](images/pca_test1.jpg)
![](images/pca_output.png)

View File

@@ -0,0 +1,273 @@
Introduction to Support Vector Machines {#tutorial_introduction_to_svm}
=======================================
@tableofcontents
@prev_tutorial{tutorial_traincascade}
@next_tutorial{tutorial_non_linear_svms}
| | |
| -: | :- |
| Original author | Fernando Iglesias García |
| Compatibility | OpenCV >= 3.0 |
Goal
----
In this tutorial you will learn how to:
- Use the OpenCV functions @ref cv::ml::SVM::train to build a classifier based on SVMs and @ref
cv::ml::SVM::predict to test its performance.
What is a SVM?
--------------
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating
hyperplane. In other words, given labeled training data (*supervised learning*), the algorithm
outputs an optimal hyperplane which categorizes new examples.
In which sense is the hyperplane obtained optimal? Let's consider the following simple problem:
For a linearly separable set of 2D-points which belong to one of two classes, find a separating
straight line.
![](images/separating-lines.png)
@note In this example we deal with lines and points in the Cartesian plane instead of hyperplanes
and vectors in a high dimensional space. This is a simplification of the problem.It is important to
understand that this is done only because our intuition is better built from examples that are easy
to imagine. However, the same concepts apply to tasks where the examples to classify lie in a space
whose dimension is higher than two.
In the above picture you can see that there exists multiple lines that offer a solution to the
problem. Is any of them better than the others? We can intuitively define a criterion to estimate
the worth of the lines: <em> A line is bad if it passes too close to the points because it will be
noise sensitive and it will not generalize correctly. </em> Therefore, our goal should be to find
the line passing as far as possible from all points.
Then, the operation of the SVM algorithm is based on finding the hyperplane that gives the largest
minimum distance to the training examples. Twice, this distance receives the important name of
**margin** within SVM's theory. Therefore, the optimal separating hyperplane *maximizes* the margin
of the training data.
![](images/optimal-hyperplane.png)
How is the optimal hyperplane computed?
---------------------------------------
Let's introduce the notation used to define formally a hyperplane:
\f[f(x) = \beta_{0} + \beta^{T} x,\f]
where \f$\beta\f$ is known as the *weight vector* and \f$\beta_{0}\f$ as the *bias*.
@note A more in depth description of this and hyperplanes you can find in the section 4.5 (*Separating
Hyperplanes*) of the book: *Elements of Statistical Learning* by T. Hastie, R. Tibshirani and J. H.
Friedman (@cite HTF01).
The optimal hyperplane can be represented in an infinite number of different ways by
scaling of \f$\beta\f$ and \f$\beta_{0}\f$. As a matter of convention, among all the possible
representations of the hyperplane, the one chosen is
\f[|\beta_{0} + \beta^{T} x| = 1\f]
where \f$x\f$ symbolizes the training examples closest to the hyperplane. In general, the training
examples that are closest to the hyperplane are called **support vectors**. This representation is
known as the **canonical hyperplane**.
Now, we use the result of geometry that gives the distance between a point \f$x\f$ and a hyperplane
\f$(\beta, \beta_{0})\f$:
\f[\mathrm{distance} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||}.\f]
In particular, for the canonical hyperplane, the numerator is equal to one and the distance to the
support vectors is
\f[\mathrm{distance}_{\text{ support vectors}} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||} = \frac{1}{||\beta||}.\f]
Recall that the margin introduced in the previous section, here denoted as \f$M\f$, is twice the
distance to the closest examples:
\f[M = \frac{2}{||\beta||}\f]
Finally, the problem of maximizing \f$M\f$ is equivalent to the problem of minimizing a function
\f$L(\beta)\f$ subject to some constraints. The constraints model the requirement for the hyperplane to
classify correctly all the training examples \f$x_{i}\f$. Formally,
\f[\min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i,\f]
where \f$y_{i}\f$ represents each of the labels of the training examples.
This is a problem of Lagrangian optimization that can be solved using Lagrange multipliers to obtain
the weight vector \f$\beta\f$ and the bias \f$\beta_{0}\f$ of the optimal hyperplane.
Source Code
-----------
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py)
- **Code at glance:**
@include samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py
@end_toggle
Explanation
-----------
- **Set up the training data**
The training data of this exercise is formed by a set of labeled 2D-points that belong to one of
two different classes; one of the classes consists of one point and the other of three points.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp setup1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java setup1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py setup1
@end_toggle
The function @ref cv::ml::SVM::train that will be used afterwards requires the training data to be
stored as @ref cv::Mat objects of floats. Therefore, we create these objects from the arrays
defined above:
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp setup2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java setup2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py setup1
@end_toggle
- **Set up SVM's parameters**
In this tutorial we have introduced the theory of SVMs in the most simple case, when the
training examples are spread into two classes that are linearly separable. However, SVMs can be
used in a wide variety of problems (e.g. problems with non-linearly separable data, a SVM using
a kernel function to raise the dimensionality of the examples, etc). As a consequence of this,
we have to define some parameters before training the SVM. These parameters are stored in an
object of the class @ref cv::ml::SVM.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp init
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java init
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py init
@end_toggle
Here:
- *Type of SVM*. We choose here the type @ref cv::ml::SVM::C_SVC "C_SVC" that can be used for
n-class classification (n \f$\geq\f$ 2). The important feature of this type is that it deals
with imperfect separation of classes (i.e. when the training data is non-linearly separable).
This feature is not important here since the data is linearly separable and we chose this SVM
type only for being the most commonly used.
- *Type of SVM kernel*. We have not talked about kernel functions since they are not
interesting for the training data we are dealing with. Nevertheless, let's explain briefly now
the main idea behind a kernel function. It is a mapping done to the training data to improve
its resemblance to a linearly separable set of data. This mapping consists of increasing the
dimensionality of the data and is done efficiently using a kernel function. We choose here the
type @ref cv::ml::SVM::LINEAR "LINEAR" which means that no mapping is done. This parameter is
defined using cv::ml::SVM::setKernel.
- *Termination criteria of the algorithm*. The SVM training procedure is implemented solving a
constrained quadratic optimization problem in an **iterative** fashion. Here we specify a
maximum number of iterations and a tolerance error so we allow the algorithm to finish in
less number of steps even if the optimal hyperplane has not been computed yet. This
parameter is defined in a structure @ref cv::TermCriteria .
- **Train the SVM**
We call the method @ref cv::ml::SVM::train to build the SVM model.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp train
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java train
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py train
@end_toggle
- **Regions classified by the SVM**
The method @ref cv::ml::SVM::predict is used to classify an input sample using a trained SVM. In
this example we have used this method in order to color the space depending on the prediction done
by the SVM. In other words, an image is traversed interpreting its pixels as points of the
Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in
green if it is the class with label 1 and in blue if it is the class with label -1.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp show
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java show
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py show
@end_toggle
- **Support vectors**
We use here a couple of methods to obtain information about the support vectors.
The method @ref cv::ml::SVM::getSupportVectors obtain all of the support
vectors. We have used this methods here to find the training examples that are
support vectors and highlight them.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp show_vectors
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java show_vectors
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py show_vectors
@end_toggle
Results
-------
- The code opens an image and shows the training examples of both classes. The points of one class
are represented with white circles and black ones are used for the other class.
- The SVM is trained and used to classify all the pixels of the image. This results in a division
of the image in a blue region and a green region. The boundary between both regions is the
optimal separating hyperplane.
- Finally the support vectors are shown using gray rings around the training examples.
![](images/svm_intro_result.png)

View File

@@ -0,0 +1,139 @@
Meanshift and Camshift {#tutorial_meanshift}
======================
@tableofcontents
@prev_tutorial{tutorial_background_subtraction}
@next_tutorial{tutorial_optical_flow}
Goal
----
In this chapter,
- We will learn about the Meanshift and Camshift algorithms to track objects in videos.
Meanshift
---------
The intuition behind the meanshift is simple. Consider you have a set of points. (It can be a pixel
distribution like histogram backprojection). You are given a small window (may be a circle) and you
have to move that window to the area of maximum pixel density (or maximum number of points). It is
illustrated in the simple image given below:
![image](images/meanshift_basics.jpg)
The initial window is shown in blue circle with the name "C1". Its original center is marked in blue
rectangle, named "C1_o". But if you find the centroid of the points inside that window, you will
get the point "C1_r" (marked in small blue circle) which is the real centroid of the window. Surely
they don't match. So move your window such that the circle of the new window matches with the previous
centroid. Again find the new centroid. Most probably, it won't match. So move it again, and continue
the iterations such that the center of window and its centroid falls on the same location (or within a
small desired error). So finally what you obtain is a window with maximum pixel distribution. It is
marked with a green circle, named "C2". As you can see in the image, it has maximum number of points. The
whole process is demonstrated on a static image below:
![image](images/meanshift_face.gif)
So we normally pass the histogram backprojected image and initial target location. When the object
moves, obviously the movement is reflected in the histogram backprojected image. As a result, the meanshift
algorithm moves our window to the new location with maximum density.
### Meanshift in OpenCV
To use meanshift in OpenCV, first we need to setup the target, find its histogram so that we can
backproject the target on each frame for calculation of meanshift. We also need to provide an initial
location of window. For histogram, only Hue is considered here. Also, to avoid false values due to
low light, low light values are discarded using **cv.inRange()** function.
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/video/meanshift/meanshift.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/video/meanshift/meanshift.cpp
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/video/meanshift/meanshift.py)
- **Code at glance:**
@include samples/python/tutorial_code/video/meanshift/meanshift.py
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/video/meanshift/MeanshiftDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/video/meanshift/MeanshiftDemo.java
@end_toggle
Three frames in a video I used is given below:
![image](images/meanshift_result.jpg)
Camshift
--------
Did you closely watch the last result? There is a problem. Our window always has the same size whether
the car is very far or very close to the camera. That is not good. We need to adapt the window
size with size and rotation of the target. Once again, the solution came from "OpenCV Labs" and it
is called CAMshift (Continuously Adaptive Meanshift) published by Gary Bradsky in his paper
"Computer Vision Face Tracking for Use in a Perceptual User Interface" in 1998 @cite Bradski98 .
It applies meanshift first. Once meanshift converges, it updates the size of the window as,
\f$s = 2 \times \sqrt{\frac{M_{00}}{256}}\f$. It also calculates the orientation of the best fitting ellipse
to it. Again it applies the meanshift with new scaled search window and previous window location.
The process continues until the required accuracy is met.
![image](images/camshift_face.gif)
### Camshift in OpenCV
It is similar to meanshift, but returns a rotated rectangle (that is our result) and box
parameters (used to be passed as search window in next iteration). See the code below:
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/video/meanshift/camshift.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/video/meanshift/camshift.cpp
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/video/meanshift/camshift.py)
- **Code at glance:**
@include samples/python/tutorial_code/video/meanshift/camshift.py
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/video/meanshift/CamshiftDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/video/meanshift/CamshiftDemo.java
@end_toggle
Three frames of the result is shown below:
![image](images/camshift_result.jpg)
Additional Resources
--------------------
-# French Wikipedia page on [Camshift](http://fr.wikipedia.org/wiki/Camshift). (The two animations
are taken from there)
2. Bradski, G.R., "Real time face and object tracking as a component of a perceptual user
interface," Applications of Computer Vision, 1998. WACV '98. Proceedings., Fourth IEEE Workshop
on , vol., no., pp.214,219, 19-21 Oct 1998
Exercises
---------
-# OpenCV comes with a Python [sample](https://github.com/opencv/opencv/blob/master/samples/python/camshift.py) for an interactive demo of camshift. Use it, hack it, understand
it.

View File

@@ -0,0 +1,288 @@
Support Vector Machines for Non-Linearly Separable Data {#tutorial_non_linear_svms}
=======================================================
@tableofcontents
@prev_tutorial{tutorial_introduction_to_svm}
@next_tutorial{tutorial_introduction_to_pca}
| | |
| -: | :- |
| Original author | Fernando Iglesias García |
| Compatibility | OpenCV >= 3.0 |
Goal
----
In this tutorial you will learn how to:
- Define the optimization problem for SVMs when it is not possible to separate linearly the
training data.
- How to configure the parameters to adapt your SVM for this class of problems.
Motivation
----------
Why is it interesting to extend the SVM optimization problem in order to handle non-linearly separable
training data? Most of the applications in which SVMs are used in computer vision require a more
powerful tool than a simple linear classifier. This stems from the fact that in these tasks __the
training data can be rarely separated using an hyperplane__.
Consider one of these tasks, for example, face detection. The training data in this case is composed
by a set of images that are faces and another set of images that are non-faces (_every other thing
in the world except from faces_). This training data is too complex so as to find a representation
of each sample (_feature vector_) that could make the whole set of faces linearly separable from the
whole set of non-faces.
Extension of the Optimization Problem
-------------------------------------
Remember that using SVMs we obtain a separating hyperplane. Therefore, since the training data is
now non-linearly separable, we must admit that the hyperplane found will misclassify some of the
samples. This _misclassification_ is a new variable in the optimization that must be taken into
account. The new model has to include both the old requirement of finding the hyperplane that gives
the biggest margin and the new one of generalizing the training data correctly by not allowing too
many classification errors.
We start here from the formulation of the optimization problem of finding the hyperplane which
maximizes the __margin__ (this is explained in the previous tutorial (@ref tutorial_introduction_to_svm):
\f[\min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i\f]
There are multiple ways in which this model can be modified so it takes into account the
misclassification errors. For example, one could think of minimizing the same quantity plus a
constant times the number of misclassification errors in the training data, i.e.:
\f[\min ||\beta||^{2} + C \text{(misclassification errors)}\f]
However, this one is not a very good solution since, among some other reasons, we do not distinguish
between samples that are misclassified with a small distance to their appropriate decision region or
samples that are not. Therefore, a better solution will take into account the _distance of the
misclassified samples to their correct decision regions_, i.e.:
\f[\min ||\beta||^{2} + C \text{(distance of misclassified samples to their correct regions)}\f]
For each sample of the training data a new parameter \f$\xi_{i}\f$ is defined. Each one of these
parameters contains the distance from its corresponding training sample to their correct decision
region. The following picture shows non-linearly separable training data from two classes, a
separating hyperplane and the distances to their correct regions of the samples that are
misclassified.
![](images/sample-errors-dist.png)
@note Only the distances of the samples that are misclassified are shown in the picture. The
distances of the rest of the samples are zero since they lay already in their correct decision
region.
The red and blue lines that appear on the picture are the margins to each one of the
decision regions. It is very __important__ to realize that each of the \f$\xi_{i}\f$ goes from a
misclassified training sample to the margin of its appropriate region.
Finally, the new formulation for the optimization problem is:
\f[\min_{\beta, \beta_{0}} L(\beta) = ||\beta||^{2} + C \sum_{i} {\xi_{i}} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 - \xi_{i} \text{ and } \xi_{i} \geq 0 \text{ } \forall i\f]
How should the parameter C be chosen? It is obvious that the answer to this question depends on how
the training data is distributed. Although there is no general answer, it is useful to take into
account these rules:
- Large values of C give solutions with _less misclassification errors_ but a _smaller margin_.
Consider that in this case it is expensive to make misclassification errors. Since the aim of
the optimization is to minimize the argument, few misclassifications errors are allowed.
- Small values of C give solutions with _bigger margin_ and _more classification errors_. In this
case the minimization does not consider that much the term of the sum so it focuses more on
finding a hyperplane with big margin.
Source Code
-----------
You may also find the source code in `samples/cpp/tutorial_code/ml/non_linear_svms` folder of the OpenCV source library or
[download it from here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp).
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py)
- **Code at glance:**
@include samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py
@end_toggle
Explanation
-----------
- __Set up the training data__
The training data of this exercise is formed by a set of labeled 2D-points that belong to one of
two different classes. To make the exercise more appealing, the training data is generated
randomly using a uniform probability density functions (PDFs).
We have divided the generation of the training data into two main parts.
In the first part we generate data for both classes that is linearly separable.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp setup1
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java setup1
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py setup1
@end_toggle
In the second part we create data for both classes that is non-linearly separable, data that
overlaps.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp setup2
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java setup2
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py setup2
@end_toggle
- __Set up SVM's parameters__
@note In the previous tutorial @ref tutorial_introduction_to_svm there is an explanation of the
attributes of the class @ref cv::ml::SVM that we configure here before training the SVM.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp init
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java init
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py init
@end_toggle
There are just two differences between the configuration we do here and the one that was done in
the previous tutorial (@ref tutorial_introduction_to_svm) that we use as reference.
- _C_. We chose here a small value of this parameter in order not to punish too much the
misclassification errors in the optimization. The idea of doing this stems from the will of
obtaining a solution close to the one intuitively expected. However, we recommend to get a
better insight of the problem by making adjustments to this parameter.
@note In this case there are just very few points in the overlapping region between classes.
By giving a smaller value to __FRAC_LINEAR_SEP__ the density of points can be incremented and the
impact of the parameter _C_ explored deeply.
- _Termination Criteria of the algorithm_. The maximum number of iterations has to be
increased considerably in order to solve correctly a problem with non-linearly separable
training data. In particular, we have increased in five orders of magnitude this value.
- __Train the SVM__
We call the method @ref cv::ml::SVM::train to build the SVM model. Watch out that the training
process may take a quite long time. Have patiance when your run the program.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp train
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java train
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py train
@end_toggle
- __Show the Decision Regions__
The method @ref cv::ml::SVM::predict is used to classify an input sample using a trained SVM. In
this example we have used this method in order to color the space depending on the prediction done
by the SVM. In other words, an image is traversed interpreting its pixels as points of the
Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in
dark green if it is the class with label 1 and in dark blue if it is the class with label 2.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show
@end_toggle
- __Show the training data__
The method @ref cv::circle is used to show the samples that compose the training data. The samples
of the class labeled with 1 are shown in light green and in light blue the samples of the class
labeled with 2.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show_data
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show_data
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show_data
@end_toggle
- __Support vectors__
We use here a couple of methods to obtain information about the support vectors. The method
@ref cv::ml::SVM::getSupportVectors obtain all support vectors. We have used this methods here
to find the training examples that are support vectors and highlight them.
@add_toggle_cpp
@snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show_vectors
@end_toggle
@add_toggle_java
@snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show_vectors
@end_toggle
@add_toggle_python
@snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show_vectors
@end_toggle
Results
-------
- The code opens an image and shows the training examples of both classes. The points of one class
are represented with light green and light blue ones are used for the other class.
- The SVM is trained and used to classify all the pixels of the image. This results in a division
of the image in a blue region and a green region. The boundary between both regions is the
separating hyperplane. Since the training data is non-linearly separable, it can be seen that
some of the examples of both classes are misclassified; some green points lay on the blue region
and some blue points lay on the green one.
- Finally the support vectors are shown using gray rings around the training examples.
![](images/svm_non_linear_result.png)
You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=vFv2yPcSo-Q).
@youtube{vFv2yPcSo-Q}

View File

@@ -0,0 +1,179 @@
Optical Flow {#tutorial_optical_flow}
============
@tableofcontents
@prev_tutorial{tutorial_meanshift}
@next_tutorial{tutorial_cascade_classifier}
Goal
----
In this chapter,
- We will understand the concepts of optical flow and its estimation using Lucas-Kanade
method.
- We will use functions like **cv.calcOpticalFlowPyrLK()** to track feature points in a
video.
- We will create a dense optical flow field using the **cv.calcOpticalFlowFarneback()** method.
Optical Flow
------------
Optical flow is the pattern of apparent motion of image objects between two consecutive frames
caused by the movement of object or camera. It is 2D vector field where each vector is a
displacement vector showing the movement of points from first frame to second. Consider the image
below (Image Courtesy: [Wikipedia article on Optical Flow](http://en.wikipedia.org/wiki/Optical_flow)).
![image](images/optical_flow_basic1.jpg)
It shows a ball moving in 5 consecutive frames. The arrow shows its displacement vector. Optical
flow has many applications in areas like :
- Structure from Motion
- Video Compression
- Video Stabilization ...
Optical flow works on several assumptions:
-# The pixel intensities of an object do not change between consecutive frames.
2. Neighbouring pixels have similar motion.
Consider a pixel \f$I(x,y,t)\f$ in first frame (Check a new dimension, time, is added here. Earlier we
were working with images only, so no need of time). It moves by distance \f$(dx,dy)\f$ in next frame
taken after \f$dt\f$ time. So since those pixels are the same and intensity does not change, we can say,
\f[I(x,y,t) = I(x+dx, y+dy, t+dt)\f]
Then take taylor series approximation of right-hand side, remove common terms and divide by \f$dt\f$ to
get the following equation:
\f[f_x u + f_y v + f_t = 0 \;\f]
where:
\f[f_x = \frac{\partial f}{\partial x} \; ; \; f_y = \frac{\partial f}{\partial y}\f]\f[u = \frac{dx}{dt} \; ; \; v = \frac{dy}{dt}\f]
Above equation is called Optical Flow equation. In it, we can find \f$f_x\f$ and \f$f_y\f$, they are image
gradients. Similarly \f$f_t\f$ is the gradient along time. But \f$(u,v)\f$ is unknown. We cannot solve this
one equation with two unknown variables. So several methods are provided to solve this problem and
one of them is Lucas-Kanade.
### Lucas-Kanade method
We have seen an assumption before, that all the neighbouring pixels will have similar motion.
Lucas-Kanade method takes a 3x3 patch around the point. So all the 9 points have the same motion. We
can find \f$(f_x, f_y, f_t)\f$ for these 9 points. So now our problem becomes solving 9 equations with
two unknown variables which is over-determined. A better solution is obtained with least square fit
method. Below is the final solution which is two equation-two unknown problem and solve to get the
solution.
\f[\begin{bmatrix} u \\ v \end{bmatrix} =
\begin{bmatrix}
\sum_{i}{f_{x_i}}^2 & \sum_{i}{f_{x_i} f_{y_i} } \\
\sum_{i}{f_{x_i} f_{y_i}} & \sum_{i}{f_{y_i}}^2
\end{bmatrix}^{-1}
\begin{bmatrix}
- \sum_{i}{f_{x_i} f_{t_i}} \\
- \sum_{i}{f_{y_i} f_{t_i}}
\end{bmatrix}\f]
( Check similarity of inverse matrix with Harris corner detector. It denotes that corners are better
points to be tracked.)
So from the user point of view, the idea is simple, we give some points to track, we receive the optical
flow vectors of those points. But again there are some problems. Until now, we were dealing with
small motions, so it fails when there is a large motion. To deal with this we use pyramids. When we go up in
the pyramid, small motions are removed and large motions become small motions. So by applying
Lucas-Kanade there, we get optical flow along with the scale.
Lucas-Kanade Optical Flow in OpenCV
-----------------------------------
OpenCV provides all these in a single function, **cv.calcOpticalFlowPyrLK()**. Here, we create a
simple application which tracks some points in a video. To decide the points, we use
**cv.goodFeaturesToTrack()**. We take the first frame, detect some Shi-Tomasi corner points in it,
then we iteratively track those points using Lucas-Kanade optical flow. For the function
**cv.calcOpticalFlowPyrLK()** we pass the previous frame, previous points and next frame. It
returns next points along with some status numbers which has a value of 1 if next point is found,
else zero. We iteratively pass these next points as previous points in next step. See the code
below:
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/video/optical_flow/optical_flow.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/video/optical_flow/optical_flow.cpp
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/video/optical_flow/optical_flow.py)
- **Code at glance:**
@include samples/python/tutorial_code/video/optical_flow/optical_flow.py
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/video/optical_flow/OpticalFlowDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/video/optical_flow/OpticalFlowDemo.java
@end_toggle
(This code doesn't check how correct are the next keypoints. So even if any feature point disappears
in image, there is a chance that optical flow finds the next point which may look close to it. So
actually for a robust tracking, corner points should be detected in particular intervals. OpenCV
samples comes up with such a sample which finds the feature points at every 5 frames. It also run a
backward-check of the optical flow points got to select only good ones. Check
samples/python/lk_track.py).
See the results we got:
![image](images/opticalflow_lk.jpg)
Dense Optical Flow in OpenCV
----------------------------
Lucas-Kanade method computes optical flow for a sparse feature set (in our example, corners detected
using Shi-Tomasi algorithm). OpenCV provides another algorithm to find the dense optical flow. It
computes the optical flow for all the points in the frame. It is based on Gunner Farneback's
algorithm which is explained in "Two-Frame Motion Estimation Based on Polynomial Expansion" by
Gunner Farneback in 2003.
Below sample shows how to find the dense optical flow using above algorithm. We get a 2-channel
array with optical flow vectors, \f$(u,v)\f$. We find their magnitude and direction. We color code the
result for better visualization. Direction corresponds to Hue value of the image. Magnitude
corresponds to Value plane. See the code below:
@add_toggle_cpp
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/video/optical_flow/optical_flow_dense.cpp)
- **Code at glance:**
@include samples/cpp/tutorial_code/video/optical_flow/optical_flow_dense.cpp
@end_toggle
@add_toggle_python
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/video/optical_flow/optical_flow_dense.py)
- **Code at glance:**
@include samples/python/tutorial_code/video/optical_flow/optical_flow_dense.py
@end_toggle
@add_toggle_java
- **Downloadable code**: Click
[here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/video/optical_flow/OpticalFlowDenseDemo.java)
- **Code at glance:**
@include samples/java/tutorial_code/video/optical_flow/OpticalFlowDenseDemo.java
@end_toggle
See the result below:
![image](images/opticalfb.jpg)

View File

@@ -0,0 +1,168 @@
High level stitching API (Stitcher class) {#tutorial_stitcher}
=========================================
@tableofcontents
@prev_tutorial{tutorial_hdr_imaging}
@next_tutorial{tutorial_background_subtraction}
| | |
| -: | :- |
| Original author | Jiri Horner |
| Compatibility | OpenCV >= 3.2 |
Goal
----
In this tutorial you will learn how to:
- use the high-level stitching API for stitching provided by
- @ref cv::Stitcher
- learn how to use preconfigured Stitcher configurations to stitch images
using different camera models.
Code
----
This tutorial code's is shown lines below. You can also download it from
[here](https://github.com/opencv/opencv/tree/master/samples/cpp/stitching.cpp).
@include samples/cpp/stitching.cpp
Explanation
-----------
The most important code part is:
@snippet cpp/stitching.cpp stitching
A new instance of stitcher is created and the @ref cv::Stitcher::stitch will
do all the hard work.
@ref cv::Stitcher::create can create stitcher in one of the predefined
configurations (argument `mode`). See @ref cv::Stitcher::Mode for details. These
configurations will setup multiple stitcher properties to operate in one of
predefined scenarios. After you create stitcher in one of predefined
configurations you can adjust stitching by setting any of the stitcher
properties.
If you have cuda device @ref cv::Stitcher can be configured to offload certain
operations to GPU. If you prefer this configuration set `try_use_gpu` to true.
OpenCL acceleration will be used transparently based on global OpenCV settings
regardless of this flag.
Stitching might fail for several reasons, you should always check if
everything went good and resulting pano is stored in `pano`. See
@ref cv::Stitcher::Status documentation for possible error codes.
Camera models
-------------
There are currently 2 camera models implemented in stitching pipeline.
- _Homography model_ expecting perspective transformations between images
implemented in @ref cv::detail::BestOf2NearestMatcher cv::detail::HomographyBasedEstimator
cv::detail::BundleAdjusterReproj cv::detail::BundleAdjusterRay
- _Affine model_ expecting affine transformation with 6 DOF or 4 DOF implemented in
@ref cv::detail::AffineBestOf2NearestMatcher cv::detail::AffineBasedEstimator
cv::detail::BundleAdjusterAffine cv::detail::BundleAdjusterAffinePartial cv::AffineWarper
Homography model is useful for creating photo panoramas captured by camera,
while affine-based model can be used to stitch scans and object captured by
specialized devices.
@note
Certain detailed settings of @ref cv::Stitcher might not make sense. Especially
you should not mix classes implementing affine model and classes implementing
Homography model, as they work with different transformations.
Try it out
----------
If you enabled building samples you can found binary under
`build/bin/cpp-example-stitching`. This example is a console application, run it without
arguments to see help. `opencv_extra` provides some sample data for testing all available
configurations.
to try panorama mode run:
```
./cpp-example-stitching --mode panorama <path to opencv_extra>/testdata/stitching/boat*
```
![](images/boat.jpg)
to try scans mode run (dataset from home-grade scanner):
```
./cpp-example-stitching --mode scans <path to opencv_extra>/testdata/stitching/newspaper*
```
![](images/newspaper.jpg)
or (dataset from professional book scanner):
```
./cpp-example-stitching --mode scans <path to opencv_extra>/testdata/stitching/budapest*
```
![](images/budapest.jpg)
@note
Examples above expects POSIX platform, on windows you have to provide all files names explicitly
(e.g. `boat1.jpg` `boat2.jpg`...) as windows command line does not support `*` expansion.
Stitching detailed (python opencv >4.0.1)
--------
If you want to study internals of the stitching pipeline or you want to experiment with detailed
configuration you can use stitching_detailed source code available in C++ or python
<H4>stitching_detailed</H4>
@add_toggle_cpp
[stitching_detailed.cpp](https://raw.githubusercontent.com/opencv/opencv/master/samples/cpp/stitching_detailed.cpp)
@end_toggle
@add_toggle_python
[stitching_detailed.py](https://raw.githubusercontent.com/opencv/opencv/master/samples/python/stitching_detailed.py)
@end_toggle
stitching_detailed program uses command line to get stitching parameter. Many parameters exists. Above examples shows some command line parameters possible :
boat5.jpg boat2.jpg boat3.jpg boat4.jpg boat1.jpg boat6.jpg --work_megapix 0.6 --features orb --matcher homography --estimator homography --match_conf 0.3 --conf_thresh 0.3 --ba ray --ba_refine_mask xxxxx --save_graph test.txt --wave_correct no --warp fisheye --blend multiband --expos_comp no --seam gc_colorgrad
![](images/fisheye.jpg)
Pairwise images are matched using an homography --matcher homography and estimator used for transformation estimation too --estimator homography
Confidence for feature matching step is 0.3 : --match_conf 0.3. You can decrease this value if you have some difficulties to match images
Threshold for two images are from the same panorama confidence is 0. : --conf_thresh 0.3 You can decrease this value if you have some difficulties to match images
Bundle adjustment cost function is ray --ba ray
Refinement mask for bundle adjustment is xxxxx ( --ba_refine_mask xxxxx) where 'x' means refine respective parameter and '_' means don't. Refine one, and has the following format: fx,skew,ppx,aspect,ppy
Save matches graph represented in DOT language to test.txt ( --save_graph test.txt) : Labels description: Nm is number of matches, Ni is number of inliers, C is confidence
![](images/gvedit.jpg)
Perform wave effect correction is no (--wave_correct no)
Warp surface type is fisheye (--warp fisheye)
Blending method is multiband (--blend multiband)
Exposure compensation method is not used (--expos_comp no)
Seam estimation estimator is Minimum graph cut-based seam (--seam gc_colorgrad)
you can use those arguments on command line too :
boat5.jpg boat2.jpg boat3.jpg boat4.jpg boat1.jpg boat6.jpg --work_megapix 0.6 --features orb --matcher homography --estimator homography --match_conf 0.3 --conf_thresh 0.3 --ba ray --ba_refine_mask xxxxx --wave_correct horiz --warp compressedPlaneA2B1 --blend multiband --expos_comp channels_blocks --seam gc_colorgrad
You will get :
![](images/compressedPlaneA2B1.jpg)
For images captured using a scanner or a drone ( affine motion) you can use those arguments on command line :
newspaper1.jpg newspaper2.jpg --work_megapix 0.6 --features surf --matcher affine --estimator affine --match_conf 0.3 --conf_thresh 0.3 --ba affine --ba_refine_mask xxxxx --wave_correct no --warp affine
![](images/affinepano.jpg)
You can find all images in https://github.com/opencv/opencv_extra/tree/master/testdata/stitching

View File

@@ -0,0 +1,13 @@
Other tutorials (ml, objdetect, photo, stitching, video) {#tutorial_table_of_content_other}
========================================================
- photo. @subpage tutorial_hdr_imaging
- stitching. @subpage tutorial_stitcher
- video. @subpage tutorial_background_subtraction
- video. @subpage tutorial_meanshift
- video. @subpage tutorial_optical_flow
- objdetect. @subpage tutorial_cascade_classifier
- objdetect. @subpage tutorial_traincascade
- ml. @subpage tutorial_introduction_to_svm
- ml. @subpage tutorial_non_linear_svms
- ml. @subpage tutorial_introduction_to_pca

View File

@@ -0,0 +1,222 @@
Cascade Classifier Training {#tutorial_traincascade}
===========================
@tableofcontents
@prev_tutorial{tutorial_cascade_classifier}
@next_tutorial{tutorial_introduction_to_svm}
Introduction
------------
Working with a boosted cascade of weak classifiers includes two major stages: the training and the detection stage. The detection stage using either HAAR or LBP based models, is described in the @ref tutorial_cascade_classifier "object detection tutorial". This documentation gives an overview of the functionality needed to train your own boosted cascade of weak classifiers. The current guide will walk through all the different stages: collecting training data, preparation of the training data and executing the actual model training.
To support this tutorial, several official OpenCV applications will be used: [opencv_createsamples](https://github.com/opencv/opencv/tree/master/apps/createsamples), [opencv_annotation](https://github.com/opencv/opencv/tree/master/apps/annotation), [opencv_traincascade](https://github.com/opencv/opencv/tree/master/apps/traincascade) and [opencv_visualisation](https://github.com/opencv/opencv/tree/master/apps/visualisation).
### Important notes
- If you come across any tutorial mentioning the old opencv_haartraining tool <i>(which is deprecated and still using the OpenCV1.x interface)</i>, then please ignore that tutorial and stick to the opencv_traincascade tool. This tool is a newer version, written in C++ in accordance to the OpenCV 2.x and OpenCV 3.x API. The opencv_traincascade supports both HAAR like wavelet features @cite Viola01 and LBP (Local Binary Patterns) @cite Liao2007 features. LBP features yield integer precision in contrast to HAAR features, yielding floating point precision, so both training and detection with LBP are several times faster then with HAAR features. Regarding the LBP and HAAR detection quality, it mainly depends on the training data used and the training parameters selected. It's possible to train a LBP-based classifier that will provide almost the same quality as HAAR-based one, within a percentage of the training time.
- The newer cascade classifier detection interface from OpenCV 2.x and OpenCV 3.x (@ref cv::CascadeClassifier) supports working with both old and new model formats. opencv_traincascade can even save (export) a trained cascade in the older format if for some reason you are stuck using the old interface. At least training the model could then be done in the most stable interface.
- The opencv_traincascade application can use TBB for multi-threading. To use it in multicore mode OpenCV must be built with TBB support enabled.
Preparation of the training data
--------------------------------
For training a boosted cascade of weak classifiers we need a set of positive samples (containing actual objects you want to detect) and a set of negative images (containing everything you do not want to detect). The set of negative samples must be prepared manually, whereas set of positive samples is created using the opencv_createsamples application.
### Negative Samples
Negative samples are taken from arbitrary images, not containing objects you want to detect. These negative images, from which the samples are generated, should be listed in a special negative image file containing one image path per line <i>(can be absolute or relative)</i>. Note that negative samples and sample images are also called background samples or background images, and are used interchangeably in this document.
Described images may be of different sizes. However, each image should be equal or larger than the desired training window size <i>(which corresponds to the model dimensions, most of the times being the average size of your object)</i>, because these images are used to subsample a given negative image into several image samples having this training window size.
An example of such a negative description file:
Directory structure:
@code{.text}
/img
img1.jpg
img2.jpg
bg.txt
@endcode
File bg.txt:
@code{.text}
img/img1.jpg
img/img2.jpg
@endcode
Your set of negative window samples will be used to tell the machine learning step, boosting in this case, what not to look for, when trying to find your objects of interest.
### Positive Samples
Positive samples are created by the opencv_createsamples application. They are used by the boosting process to define what the model should actually look for when trying to find your objects of interest. The application supports two ways of generating a positive sample dataset.
1. You can generate a bunch of positives from a single positive object image.
2. You can supply all the positives yourself and only use the tool to cut them out, resize them and put them in the opencv needed binary format.
While the first approach works decently for fixed objects, like very rigid logo's, it tends to fail rather soon for less rigid objects. In that case we do suggest to use the second approach. Many tutorials on the web even state that 100 real object images, can lead to a better model than 1000 artificially generated positives, by using the opencv_createsamples application. If you however do decide to take the first approach, keep some things in mind:
- Please note that you need more than a single positive samples before you give it to the mentioned application, because it only applies perspective transformation.
- If you want a robust model, take samples that cover the wide range of varieties that can occur within your object class. For example in the case of faces you should consider different races and age groups, emotions and perhaps beard styles. This also applies when using the second approach.
The first approach takes a single object image with for example a company logo and creates a large set of positive samples from the given object image by randomly rotating the object, changing the image intensity as well as placing the image on arbitrary backgrounds. The amount and range of randomness can be controlled by command line arguments of the opencv_createsamples application.
Command line arguments:
- `-vec <vec_file_name>` : Name of the output file containing the positive samples for training.
- `-img <image_file_name>` : Source object image (e.g., a company logo).
- `-bg <background_file_name>` : Background description file; contains a list of images which are used as a background for randomly distorted versions of the object.
- `-num <number_of_samples>` : Number of positive samples to generate.
- `-bgcolor <background_color>` : Background color (currently grayscale images are assumed); the background color denotes the transparent color. Since there might be compression artifacts, the amount of color tolerance can be specified by -bgthresh. All pixels within bgcolor-bgthresh and bgcolor+bgthresh range are interpreted as transparent.
- `-bgthresh <background_color_threshold>`
- `-inv` : If specified, colors will be inverted.
- `-randinv` : If specified, colors will be inverted randomly.
- `-maxidev <max_intensity_deviation>` : Maximal intensity deviation of pixels in foreground samples.
- `-maxxangle <max_x_rotation_angle>` : Maximal rotation angle towards x-axis, must be given in radians.
- `-maxyangle <max_y_rotation_angle>` : Maximal rotation angle towards y-axis, must be given in radians.
- `-maxzangle <max_z_rotation_angle>` : Maximal rotation angle towards z-axis, must be given in radians.
- `-show` : Useful debugging option. If specified, each sample will be shown. Pressing Esc will continue the samples creation process without showing each sample.
- `-w <sample_width>` : Width (in pixels) of the output samples.
- `-h <sample_height>` : Height (in pixels) of the output samples.
When running opencv_createsamples in this way, the following procedure is used to create a sample object instance: The given source image is rotated randomly around all three axes. The chosen angle is limited by `-maxxangle`, `-maxyangle` and `-maxzangle`. Then pixels having the intensity from the [bg_color-bg_color_threshold; bg_color+bg_color_threshold] range are interpreted as transparent. White noise is added to the intensities of the foreground. If the `-inv` key is specified then foreground pixel intensities are inverted. If `-randinv` key is specified then algorithm randomly selects whether inversion should be applied to this sample. Finally, the obtained image is placed onto an arbitrary background from the background description file, resized to the desired size specified by `-w` and `-h` and stored to the vec-file, specified by the `-vec` command line option.
Positive samples also may be obtained from a collection of previously marked up images, which is the desired way when building robust object models. This collection is described by a text file similar to the background description file. Each line of this file corresponds to an image. The first element of the line is the filename, followed by the number of object annotations, followed by numbers describing the coordinates of the objects bounding rectangles (x, y, width, height).
An example of description file:
Directory structure:
@code{.text}
/img
img1.jpg
img2.jpg
info.dat
@endcode
File info.dat:
@code{.text}
img/img1.jpg 1 140 100 45 45
img/img2.jpg 2 100 200 50 50 50 30 25 25
@endcode
Image img1.jpg contains single object instance with the following coordinates of bounding rectangle:
(140, 100, 45, 45). Image img2.jpg contains two object instances.
In order to create positive samples from such collection, `-info` argument should be specified instead of `-img`:
- `-info <collection_file_name>` : Description file of marked up images collection.
Note that in this case, parameters like `-bg, -bgcolor, -bgthreshold, -inv, -randinv, -maxxangle, -maxyangle, -maxzangle` are simply ignored and not used anymore. The scheme of samples creation in this case is as follows. The object instances are taken from the given images, by cutting out the supplied bounding boxes from the original images. Then they are resized to target samples size (defined by `-w` and `-h`) and stored in output vec-file, defined by the `-vec` parameter. No distortion is applied, so the only affecting arguments are `-w`, `-h`, `-show` and `-num`.
The manual process of creating the `-info` file can also been done by using the opencv_annotation tool. This is an open source tool for visually selecting the regions of interest of your object instances in any given images. The following subsection will discuss in more detail on how to use this application.
#### Extra remarks
- opencv_createsamples utility may be used for examining samples stored in any given positive samples file. In order to do this only `-vec`, `-w` and `-h` parameters should be specified.
- Example of vec-file is available here `opencv/data/vec_files/trainingfaces_24-24.vec`. It can be used to train a face detector with the following window size: `-w 24 -h 24`.
### Using OpenCV's integrated annotation tool
Since OpenCV 3.x the community has been supplying and maintaining a open source annotation tool, used for generating the `-info` file. The tool can be accessed by the command opencv_annotation if the OpenCV applications where build.
Using the tool is quite straightforward. The tool accepts several required and some optional parameters:
- `--annotations` <b>(required)</b> : path to annotations txt file, where you want to store your annotations, which is then passed to the `-info` parameter [example - /data/annotations.txt]
- `--images` <b>(required)</b> : path to folder containing the images with your objects [example - /data/testimages/]
- `--maxWindowHeight` <i>(optional)</i> : if the input image is larger in height then the given resolution here, resize the image for easier annotation, using `--resizeFactor`.
- `--resizeFactor` <i>(optional)</i> : factor used to resize the input image when using the `--maxWindowHeight` parameter.
Note that the optional parameters can only be used together. An example of a command that could be used can be seen below
@code{.text}
opencv_annotation --annotations=/path/to/annotations/file.txt --images=/path/to/image/folder/
@endcode
This command will fire up a window containing the first image and your mouse cursor which will be used for annotation. A video on how to use the annotation tool can be found [here](https://www.youtube.com/watch?v=EV5gmvoCTSk). Basically there are several keystrokes that trigger an action. The left mouse button is used to select the first corner of your object, then keeps drawing until you are fine, and stops when a second left mouse button click is registered. After each selection you have the following choices:
- Pressing `c` : confirm the annotation, turning the annotation green and confirming it is stored
- Pressing `d` : delete the last annotation from the list of annotations (easy for removing wrong annotations)
- Pressing `n` : continue to the next image
- Pressing `ESC` : this will exit the annotation software
Finally you will end up with a usable annotation file that can be passed to the `-info` argument of opencv_createsamples.
Cascade Training
----------------
The next step is the actual training of the boosted cascade of weak classifiers, based on the positive and negative dataset that was prepared beforehand.
Command line arguments of opencv_traincascade application grouped by purposes:
- Common arguments:
- `-data <cascade_dir_name>` : Where the trained classifier should be stored. This folder should be created manually beforehand.
- `-vec <vec_file_name>` : vec-file with positive samples (created by opencv_createsamples utility).
- `-bg <background_file_name>` : Background description file. This is the file containing the negative sample images.
- `-numPos <number_of_positive_samples>` : Number of positive samples used in training for every classifier stage.
- `-numNeg <number_of_negative_samples>` : Number of negative samples used in training for every classifier stage.
- `-numStages <number_of_stages>` : Number of cascade stages to be trained.
- `-precalcValBufSize <precalculated_vals_buffer_size_in_Mb>` : Size of buffer for precalculated feature values (in Mb). The more memory you assign the faster the training process, however keep in mind that `-precalcValBufSize` and `-precalcIdxBufSize` combined should not exceed you available system memory.
- `-precalcIdxBufSize <precalculated_idxs_buffer_size_in_Mb>` : Size of buffer for precalculated feature indices (in Mb). The more memory you assign the faster the training process, however keep in mind that `-precalcValBufSize` and `-precalcIdxBufSize` combined should not exceed you available system memory.
- `-baseFormatSave` : This argument is actual in case of Haar-like features. If it is specified, the cascade will be saved in the old format. This is only available for backwards compatibility reasons and to allow users stuck to the old deprecated interface, to at least train models using the newer interface.
- `-numThreads <max_number_of_threads>` : Maximum number of threads to use during training. Notice that the actual number of used threads may be lower, depending on your machine and compilation options. By default, the maximum available threads are selected if you built OpenCV with TBB support, which is needed for this optimization.
- `-acceptanceRatioBreakValue <break_value>` : This argument is used to determine how precise your model should keep learning and when to stop. A good guideline is to train not further than 10e-5, to ensure the model does not overtrain on your training data. By default this value is set to -1 to disable this feature.
- Cascade parameters:
- `-stageType <BOOST(default)>` : Type of stages. Only boosted classifiers are supported as a stage type at the moment.
- `-featureType<{HAAR(default), LBP}>` : Type of features: HAAR - Haar-like features, LBP - local binary patterns.
- `-w <sampleWidth>` : Width of training samples (in pixels). Must have exactly the same value as used during training samples creation (opencv_createsamples utility).
- `-h <sampleHeight>` : Height of training samples (in pixels). Must have exactly the same value as used during training samples creation (opencv_createsamples utility).
- Boosted classifier parameters:
- `-bt <{DAB, RAB, LB, GAB(default)}>` : Type of boosted classifiers: DAB - Discrete AdaBoost, RAB - Real AdaBoost, LB - LogitBoost, GAB - Gentle AdaBoost.
- `-minHitRate <min_hit_rate>` : Minimal desired hit rate for each stage of the classifier. Overall hit rate may be estimated as (min_hit_rate ^ number_of_stages), @cite Viola04 §4.1.
- `-maxFalseAlarmRate <max_false_alarm_rate>` : Maximal desired false alarm rate for each stage of the classifier. Overall false alarm rate may be estimated as (max_false_alarm_rate ^ number_of_stages), @cite Viola04 §4.1.
- `-weightTrimRate <weight_trim_rate>` : Specifies whether trimming should be used and its weight. A decent choice is 0.95.
- `-maxDepth <max_depth_of_weak_tree>` : Maximal depth of a weak tree. A decent choice is 1, that is case of stumps.
- `-maxWeakCount <max_weak_tree_count>` : Maximal count of weak trees for every cascade stage. The boosted classifier (stage) will have so many weak trees (<=maxWeakCount), as needed to achieve the given `-maxFalseAlarmRate`.
- Haar-like feature parameters:
- `-mode <BASIC (default) | CORE | ALL>` : Selects the type of Haar features set used in training. BASIC use only upright features, while ALL uses the full set of upright and 45 degree rotated feature set. See @cite Lienhart02 for more details.
- Local Binary Patterns parameters: Local Binary Patterns don't have parameters.
After the opencv_traincascade application has finished its work, the trained cascade will be saved in `cascade.xml` file in the `-data` folder. Other files in this folder are created for the case of interrupted training, so you may delete them after completion of training.
Training is finished and you can test your cascade classifier!
Visualising Cascade Classifiers
-------------------------------
From time to time it can be useful to visualise the trained cascade, to see which features it selected and how complex its stages are. For this OpenCV supplies a opencv_visualisation application. This application has the following commands:
- `--image` <b>(required)</b> : path to a reference image for your object model. This should be an annotation with dimensions [`-w`,`-h`] as passed to both opencv_createsamples and opencv_traincascade application.
- `--model` <b>(required)</b> : path to the trained model, which should be in the folder supplied to the `-data` parameter of the opencv_traincascade application.
- `--data` <i>(optional)</i> : if a data folder is supplied, which has to be manually created beforehand, stage output and a video of the features will be stored.
An example command can be seen below
@code{.text}
opencv_visualisation --image=/data/object.png --model=/data/model.xml --data=/data/result/
@endcode
Some limitations of the current visualisation tool
- Only handles cascade classifier models, trained with the opencv_traincascade tool, containing __stumps__ as decision trees [default settings].
- The image provided needs to be a sample window with the original model dimensions, passed to the `--image` parameter.
Example of the HAAR/LBP face model ran on a given window of Angelina Jolie, which had the same preprocessing as cascade classifier files -->24x24 pixel image, grayscale conversion and histogram equalisation:
_A video is made with for each stage each feature visualised:_
![](images/visualisation_video.png)
_Each stage is stored as an image for future validation of the features:_
![](images/visualisation_single_stage.png)
_This work was created for [OpenCV 3 Blueprints](https://www.packtpub.com/application-development/opencv-3-blueprints) by StevenPuttemans but Packt Publishing agreed integration into OpenCV._