Showing posts with label PRINCIPAL COMPONENT ANALYSIS (PCA). Show all posts
Showing posts with label PRINCIPAL COMPONENT ANALYSIS (PCA). Show all posts

Sunday, 22 January 2023

PRINCIPAL COMPONENT ANALYSIS (PCA)

(PCA): - Dimension reduction analysis


-> It is a technique for feature extraction from a given data set.


-> There are N-number of Principal Component corresponding to the N-number of data.


-> 95% of the features of the extracted data belong to the first principal component.


-> Therefore, we have to select the first n-number of the principal component corresponding to the N-number of data; the choice of the n-number principal component is determined by the precision we are aiming for. 


-> So, PCA reduces the N-number of the principal components corresponding to the N-number of data into the n-number of the features; N >> n


-> Consequently, another name for this method is a dimension reduction analysis.


-> Example: -  Considering the situation for the data sets X and Y.


            X     =     1,     2,     3,     4,     5,     6,     7,     8,     9,     10.

            Y     =     1,     4,     9,    16,   25,   36,   49,   64,    81,  100.

Here, the number of features = 2  and the number of samples = 10.


The steps for computing PCA are given as follows:

Step 1:    Generate the covariance matrix for datasets X and Y. 


                        | Cov (X, X)    Cov (X, Y) |
$A _{(2 X 2)}$   =      
                        | Cov (Y, X)     Cov (Y, Y) |


$\begin{align}Cov{(X, Y)}=\sum_{i=1}^N\frac{(x_i-\mu_X)(y_i-\mu_Y)}{N}\end{align}$

$Where, \; \mu_X$ and $\mu_Y$ are the mean of the given data sets $X$ and $Y$ respectively.



                            |  8.25          90.75 |
$A _{(2 X 2)}$   =      
                            | 90.75    1051.05 |


Step 2:    Generate the characteristics equation by using covariance matrix $A_{(2 X 2)}$.


    Note:-  det ($A_{(2 X 2)}$ - $ \lambda $ I ) = 0; represents the characteristics equation and I = unit matrix.


               | 8.25 - $\lambda$          90.75 | 
      $\det$                                           = 0
               | 90.75    1051.05 - $\lambda$ |  


   $\implies (8.25 - \lambda) (1051.05 - \lambda) - 90.75 * 90.75 = 0 $



$\implies  \lambda^{2} - 1059.3 \lambda + 435.6 = 0 $   .  . .   (1)                  

$ \implies \lambda_{1}=1058.89,      \lambda_{2}=0.411375$ .  

        The $ \lambda_{1}, \lambda_{2}$ represents the Eigen Values of the matrix $A_{(2 X 2)}$.

The first principal component is defined by the largest eigenvalue, the second principal component by the second-largest eigenvalue, and so on.


Step 3:    The computation of the Eigen Vectors corresponding to the Eigen Values. 


        $(A_{(2 X 2)} - \lambda_{i} I) U_{i} = 0$   .  .  .  (2)


        When $\lambda_{1}$ = 1058.89,    then  the $(A_{(2 X 2)} - \lambda_{i} I) U_{i}$ =  

     | -1050.64    90.75 |  | $u_{1}$ |                           | 0 |
                                                           
     |   90.75       -7.84  |  | $u_{2}$ |                           | 0 |


Now equating the matrix on both sides, we get. 

$-1050.64 * u_{1} + 90.75 * u_{2} = 0 $     . .  . (3)
and      $ 90.75 * u_{1} - 7.84 * u_{2} = 0 $     ... (4)

The Eigen Vectors corresponding to equations (3) and (4) are as follows:


|$u_{1}$|              | $90.75 * k$|                     |$7.84*k$|
             =                                    OR
|$u_{2}$|              |$1050.64*k$|                  |$90.75*k$| 

Where 'k=1' is a constant.

        When $\lambda_{2}$ = 0.411375,    then  the $(A_{(2 X 2)} - \lambda_{i} I) U_{i}$ =

  | 7.838625                90.75 |  | $u_{1}$ |                 | 0 |
                                                                     
  |   90.75       1050.638625  |  | $u_{2}$ |                 | 0 |


Now equating the matrix on both sides, we get. 

$7.838625 * u_{1} + 90.75 * u_{2} = 0 $     .   .   .  (5)
and   
 $ 90.75 * u_{1} + 1050.638625 * u_{2} = 0 $ . . . (6)

The Eigen Vectors corresponding to equations (5) and (6) are as follows: 


|$u_{1}$|              |$90.75*k$|                   |$1050.64*k$|
             =                                  OR
|$u_{2}$|              |$-7.84*k$|                  |$-90.75*k$| 

Where 'k=1' is a constant.

Step 4:  Computes the Normalized eigenvectors.









Home

Soft Computing Laboratory Assignments-I, II, III (click here) PRINCIPAL COMPONENT ANALYSIS (PCA) (click here) Soft Computing Laboratory Assi...