init - 初始化项目

2022-05-06 01:58:53 +08:00
commit 90a5cc7cb6
6772 changed files with 2837787 additions and 0 deletions
--- a/doc/py_tutorials/py_ml/py_kmeans/images/kmeans_begin.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/images/kmeans_begin.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/images/kmeans_demo.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/images/kmeans_demo.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_index.markdown
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_index.markdown
@@ -0,0 +1,10 @@
+K-Means Clustering {#tutorial_py_kmeans_index}
+==================
+
+-   @subpage tutorial_py_kmeans_understanding
+
+    Read to get an intuitive understanding of K-Means Clustering
+
+-   @subpage tutorial_py_kmeans_opencv
+
+    Now let's try K-Means functions in OpenCV
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_1d_clustered.png
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_1d_clustered.png
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_1d_testdata.png
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_1d_testdata.png
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_2d_clustered.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_2d_clustered.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_color_quantization.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_color_quantization.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_feature_representation.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_feature_representation.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv.markdown
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv.markdown
@@ -0,0 +1,194 @@
+K-Means Clustering in OpenCV {#tutorial_py_kmeans_opencv}
+============================
+
+Goal
+----
+
+-   Learn to use **cv.kmeans()** function in OpenCV for data clustering
+
+Understanding Parameters
+------------------------
+
+### Input parameters
+
+-#  **samples** : It should be of **np.float32** data type, and each feature should be put in a
+    single column.
+-#  **nclusters(K)** : Number of clusters required at end
+-#  **criteria** : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are \`( type, max_iter, epsilon )\`:
+        -#  type of termination criteria. It has 3 flags as below:
+            - **cv.TERM_CRITERIA_EPS** - stop the algorithm iteration if specified accuracy, *epsilon*, is reached.
+            - **cv.TERM_CRITERIA_MAX_ITER** - stop the algorithm after the specified number of iterations, *max_iter*.
+            - **cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER** - stop the iteration when any of the above condition is met.
+        -#  max_iter - An integer specifying maximum number of iterations.
+        -#  epsilon - Required accuracy
+
+-#  **attempts** : Flag to specify the number of times the algorithm is executed using different
+    initial labellings. The algorithm returns the labels that yield the best compactness. This
+    compactness is returned as output.
+-#  **flags** : This flag is used to specify how initial centers are taken. Normally two flags are
+    used for this : **cv.KMEANS_PP_CENTERS** and **cv.KMEANS_RANDOM_CENTERS**.
+
+### Output parameters
+
+-#  **compactness** : It is the sum of squared distance from each point to their corresponding
+    centers.
+-#  **labels** : This is the label array (same as 'code' in previous article) where each element
+    marked '0', '1'.....
+-#  **centers** : This is array of centers of clusters.
+
+Now we will see how to apply K-Means algorithm with three examples.
+
+1. Data with Only One Feature
+-----------------------------
+
+Consider, you have a set of data with only one feature, ie one-dimensional. For eg, we can take our
+t-shirt problem where you use only height of people to decide the size of t-shirt.
+
+So we start by creating data and plot it in Matplotlib
+@code{.py}
+import numpy as np
+import cv2 as cv
+from matplotlib import pyplot as plt
+
+x = np.random.randint(25,100,25)
+y = np.random.randint(175,255,25)
+z = np.hstack((x,y))
+z = z.reshape((50,1))
+z = np.float32(z)
+plt.hist(z,256,[0,256]),plt.show()
+@endcode
+So we have 'z' which is an array of size 50, and values ranging from 0 to 255. I have reshaped 'z'
+to a column vector. It will be more useful when more than one features are present. Then I made data
+of np.float32 type.
+
+We get following image :
+
+![image](images/oc_1d_testdata.png)
+
+Now we apply the KMeans function. Before that we need to specify the criteria. My criteria is such
+that, whenever 10 iterations of algorithm is ran, or an accuracy of epsilon = 1.0 is reached, stop
+the algorithm and return the answer.
+@code{.py}
+# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
+criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
+
+# Set flags (Just to avoid line break in the code)
+flags = cv.KMEANS_RANDOM_CENTERS
+
+# Apply KMeans
+compactness,labels,centers = cv.kmeans(z,2,None,criteria,10,flags)
+@endcode
+This gives us the compactness, labels and centers. In this case, I got centers as 60 and 207. Labels
+will have the same size as that of test data where each data will be labelled as '0','1','2' etc.
+depending on their centroids. Now we split the data to different clusters depending on their labels.
+@code{.py}
+A = z[labels==0]
+B = z[labels==1]
+@endcode
+Now we plot A in Red color and B in Blue color and their centroids in Yellow color.
+@code{.py}
+# Now plot 'A' in red, 'B' in blue, 'centers' in yellow
+plt.hist(A,256,[0,256],color = 'r')
+plt.hist(B,256,[0,256],color = 'b')
+plt.hist(centers,32,[0,256],color = 'y')
+plt.show()
+@endcode
+Below is the output we got:
+
+![image](images/oc_1d_clustered.png)
+
+2. Data with Multiple Features
+------------------------------
+
+In previous example, we took only height for t-shirt problem. Here, we will take both height and
+weight, ie two features.
+
+Remember, in previous case, we made our data to a single column vector. Each feature is arranged in
+a column, while each row corresponds to an input test sample.
+
+For example, in this case, we set a test data of size 50x2, which are heights and weights of 50
+people. First column corresponds to height of all the 50 people and second column corresponds to
+their weights. First row contains two elements where first one is the height of first person and
+second one his weight. Similarly remaining rows corresponds to heights and weights of other people.
+Check image below:
+
+![image](images/oc_feature_representation.jpg)
+
+Now I am directly moving to the code:
+@code{.py}
+import numpy as np
+import cv2 as cv
+from matplotlib import pyplot as plt
+
+X = np.random.randint(25,50,(25,2))
+Y = np.random.randint(60,85,(25,2))
+Z = np.vstack((X,Y))
+
+# convert to np.float32
+Z = np.float32(Z)
+
+# define criteria and apply kmeans()
+criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
+ret,label,center=cv.kmeans(Z,2,None,criteria,10,cv.KMEANS_RANDOM_CENTERS)
+
+# Now separate the data, Note the flatten()
+A = Z[label.ravel()==0]
+B = Z[label.ravel()==1]
+
+# Plot the data
+plt.scatter(A[:,0],A[:,1])
+plt.scatter(B[:,0],B[:,1],c = 'r')
+plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's')
+plt.xlabel('Height'),plt.ylabel('Weight')
+plt.show()
+@endcode
+Below is the output we get:
+
+![image](images/oc_2d_clustered.jpg)
+
+3. Color Quantization
+---------------------
+
+Color Quantization is the process of reducing number of colors in an image. One reason to do so is
+to reduce the memory. Sometimes, some devices may have limitation such that it can produce only
+limited number of colors. In those cases also, color quantization is performed. Here we use k-means
+clustering for color quantization.
+
+There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape
+the image to an array of Mx3 size (M is number of pixels in image). And after the clustering, we
+apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have
+specified number of colors. And again we need to reshape it back to the shape of original image.
+Below is the code:
+@code{.py}
+import numpy as np
+import cv2 as cv
+
+img = cv.imread('home.jpg')
+Z = img.reshape((-1,3))
+
+# convert to np.float32
+Z = np.float32(Z)
+
+# define criteria, number of clusters(K) and apply kmeans()
+criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
+K = 8
+ret,label,center=cv.kmeans(Z,K,None,criteria,10,cv.KMEANS_RANDOM_CENTERS)
+
+# Now convert back into uint8, and make original image
+center = np.uint8(center)
+res = center[label.flatten()]
+res2 = res.reshape((img.shape))
+
+cv.imshow('res2',res2)
+cv.waitKey(0)
+cv.destroyAllWindows()
+@endcode
+See the result below for K=8:
+
+![image](images/oc_color_quantization.jpg)
+
+Additional Resources
+--------------------
+
+Exercises
+---------
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/final_clusters.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/final_clusters.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/initial_labelling.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/initial_labelling.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/testdata.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/testdata.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/tshirt.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/tshirt.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/tshirt_grouped.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/tshirt_grouped.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/update_centroid.jpg
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/update_centroid.jpg
--- a/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/py_kmeans_understanding.markdown
+++ b/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/py_kmeans_understanding.markdown
@@ -0,0 +1,85 @@
+Understanding K-Means Clustering {#tutorial_py_kmeans_understanding}
+================================
+
+Goal
+----
+
+In this chapter, we will understand the concepts of K-Means Clustering, how it works etc.
+
+Theory
+------
+
+We will deal this with an example which is commonly used.
+
+### T-shirt size problem
+
+Consider a company, which is going to release a new model of T-shirt to market. Obviously they will
+have to manufacture models in different sizes to satisfy people of all sizes. So the company make a
+data of people's height and weight, and plot them on to a graph, as below:
+
+![image](images/tshirt.jpg)
+
+Company can't create t-shirts with all the sizes. Instead, they divide people to Small, Medium and
+Large, and manufacture only these 3 models which will fit into all the people. This grouping of
+people into three groups can be done by k-means clustering, and algorithm provides us best 3 sizes,
+which will satisfy all the people. And if it doesn't, company can divide people to more groups, may
+be five, and so on. Check image below :
+
+![image](images/tshirt_grouped.jpg)
+
+### How does it work ?
+
+This algorithm is an iterative process. We will explain it step-by-step with the help of images.
+
+Consider a set of data as below ( You can consider it as t-shirt problem). We need to cluster this
+data into two groups.
+
+![image](images/testdata.jpg)
+
+**Step : 1** - Algorithm randomly chooses two centroids, \f$C1\f$ and \f$C2\f$ (sometimes, any two data are
+taken as the centroids).
+
+**Step : 2** - It calculates the distance from each point to both centroids. If a test data is more
+closer to \f$C1\f$, then that data is labelled with '0'. If it is closer to \f$C2\f$, then labelled as '1'
+(If more centroids are there, labelled as '2','3' etc).
+
+In our case, we will color all '0' labelled with red, and '1' labelled with blue. So we get
+following image after above operations.
+
+![image](images/initial_labelling.jpg)
+
+**Step : 3** - Next we calculate the average of all blue points and red points separately and that
+will be our new centroids. That is \f$C1\f$ and \f$C2\f$ shift to newly calculated centroids. (Remember, the
+images shown are not true values and not to true scale, it is just for demonstration only).
+
+And again, perform step 2 with new centroids and label data to '0' and '1'.
+
+So we get result as below :
+
+![image](images/update_centroid.jpg)
+
+Now **Step - 2** and **Step - 3** are iterated until both centroids are converged to fixed points.
+*(Or it may be stopped depending on the criteria we provide, like maximum number of iterations, or a
+specific accuracy is reached etc.)* **These points are such that sum of distances between test data
+and their corresponding centroids are minimum**. Or simply, sum of distances between
+\f$C1 \leftrightarrow Red\_Points\f$ and \f$C2 \leftrightarrow Blue\_Points\f$ is minimum.
+
+\f[minimize \;\bigg[J = \sum_{All\: Red\_Points}distance(C1,Red\_Point) + \sum_{All\: Blue\_Points}distance(C2,Blue\_Point)\bigg]\f]
+
+Final result almost looks like below :
+
+![image](images/final_clusters.jpg)
+
+So this is just an intuitive understanding of K-Means Clustering. For more details and mathematical
+explanation, please read any standard machine learning textbooks or check links in additional
+resources. It is just a top layer of K-Means clustering. There are a lot of modifications to this
+algorithm like, how to choose the initial centroids, how to speed up the iteration process etc.
+
+Additional Resources
+--------------------
+
+-#  [Machine Learning Course](https://www.coursera.org/course/ml), Video lectures by Prof. Andrew Ng
+    (Some of the images are taken from this)
+
+Exercises
+---------