Top view of charts and colored pencils on a desk next to a laptop, perfect for business or education themes.

The right way to Calculate Mahalanobis Distance in R

Posted on
banner 336x280

The Mahalanobis distance is the gap between two issues in a multivariate field.

It’s incessantly impaired to seek out outliers in statistical analyses that contain a number of variables.

banner 468x60

This instructional explains easy methods to calculate the Mahalanobis distance in R.

Instance: Mahalanobis Distance in R

Significance please see steps to calculate the Mahalanobis distance for each commentary in a dataset in R.

Step 1: Form the dataset.

First, we’ll form a dataset that presentations the examination rating of 20 scholars along side the selection of hours they spent learning, the selection of prep checks they took, and their wave grade within the direction:

#form knowledge
df = knowledge.body(rating = c(91, 93, 72, 87, 86, 73, 68, 87, 78, 99, 95, 76, 84, 96, 76, 80, 83, 84, 73, 74),
        hours = c(16, 6, 3, 1, 2, 3, 2, 5, 2, 5, 2, 3, 4, 3, 3, 3, 4, 3, 4, 4),
        prep = c(3, 4, 0, 3, 4, 0, 1, 2, 1, 2, 3, 3, 3, 2, 2, 2, 3, 3, 2, 2),
        grade = c(70, 88, 80, 83, 88, 84, 78, 94, 90, 93, 89, 82, 95, 94, 81, 93, 93, 90, 89, 89))

#view first six rows of knowledge
head(df)

  rating hours prep grade
1    91    16    3    70
2    93     6    4    88
3    72     3    0    80
4    87     1    3    83
5    86     2    4    88
6    73     3    0    84

Step 2: Calculate the Mahalanobis distance for every commentary.

After, we’ll usefulness the integrated mahalanobis() serve as in R to calculate the Mahalanobis distance for every commentary, which makes use of please see syntax:

mahalanobis(x, heart, cov)

the place:

  • x: matrix of knowledge
  • heart: ruthless vector of the distribution
  • cov: covariance matrix of the distribution

Refer to code presentations easy methods to put into effect this serve as for our dataset:

#calculate Mahalanobis distance for every commentary
mahalanobis(df, colMeans(df), cov(df))

 [1] 16.5019630  2.6392864  4.8507973  5.2012612  3.8287341  4.0905633
 [7]  4.2836303  2.4198736  1.6519576  5.6578253  3.9658770  2.9350178
[13]  2.8102109  4.3682945  1.5610165  1.4595069  2.0245748  0.7502536
[19]  2.7351292  2.2642268

Step 3: Calculate the p-value for every Mahalanobis distance.

banner 336x280

Leave a Reply

Your email address will not be published. Required fields are marked *