## How to create Pearson correlation coefficient matrix

In Statistics, Pearson correlation coefficient is widely used to find out relationship among random variables. In this tutorial, we will learn to create a correlation matrix and represents using heat matrix.

## Pearson correlation coefficient

Lets assume are random variables and Pearson correlation coefficient matrix is a matrix where entry represents correlation between variable and .

Pearson correlation coefficient between two variables and is given as follows

(1)

here, is co-variance between and defined as

here, is the mean of variable and is the number of points in sample

### Python Code for Pearson correlation coefficient

import numpy as np x=np.array([12,55,23,67,86,34]) y=np.array([34,90,56,134,162,78]) # Calculate mean for x and y x_bar=np.mean(x) y_bar=np.mean(y) # number of data points in x n=len(x) # nominator and denominator for covariance cov_nominator=np.sum(np.array([(x[i]-x_bar)*(y[i]-y_bar) for i in range(n)])) cov_denominator=(n-1) # covariance computation cov=cov_nominator/cov_denominator # standard deviation computation x_diff=(x-x_bar)**2 y_diff=(y-y_bar)**2 std_x=np.sqrt(np.sum(x_diff)/(n-1)) std_y=np.sqrt(np.sum(y_diff)/(n-1)) cor=cov/(std_x*std_y) print 'Correlation coefficient:',cor

## Pearson correlation coefficient matrix in Heat matrix format

We will use iris dataset to generate correlation coefficient matrix and show it in heat matrix form.

import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Read iris dataset dataset=pd.read_csv('iris.csv') # Print attributes name print dataset.columns.values # Drop last column from dataset df=dataset.drop('species',1) # Generate pearson correlation matrix cor=df.corr(method='pearson') print cor # Printing correlation in heat matrix cm=plt.cm.viridis sns.heatmap(cor,cmap=cm,linewidths=0.1,linecolor='white',annot=True) plt.show()

Output

['sepal_length' 'sepal_width' 'petal_length' 'petal_width' 'species'] sepal_length sepal_width petal_length petal_width sepal_length 1.000000 -0.109369 0.871754 0.817954 sepal_width -0.109369 1.000000 -0.420516 -0.356544 petal_length 0.871754 -0.420516 1.000000 0.962757 petal_width 0.817954 -0.356544 0.962757 1.000000