GeographyStatistical Techniques and Data AnalysisA-Level

Pearson's Product-Moment Correlation Coefficient

A statistical measure that quantifies the strength and direction of the linear relationship between two continuous interval or ratio variables.

Understand the formulaSee the free derivationOpen the full walkthrough

This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.

Core idea

Overview

Pearson's r produces a value between -1 and +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear correlation. In geographical research, it is essential for testing hypotheses about how two variables, such as distance from a CBD and property prices, covary across a landscape. The coefficient assumes that the data is normally distributed and that the relationship is strictly linear.

When to use: Use when analyzing two sets of interval or ratio data to determine if a linear trend exists between them.

Why it matters: It allows geographers to move beyond visual inspection of scatter graphs to provide a statistically significant confirmation of relationships between environmental or social variables.

Symbols

Variables

r = Correlation Coefficient, n = Sample size, x = Variable 1 data points, y = Variable 2 data points

Correlation Coefficient
Variable
Sample size
Variable
Variable 1 data points
Variable
Variable 2 data points
Variable

Walkthrough

Derivation

Derivation of Pearson's Product-Moment Correlation Coefficient

The formula is derived from the definition of the correlation coefficient as the covariance of two variables divided by the product of their standard deviations. It simplifies the algebraic expression of the Pearson coefficient for easier computational use.

  • The relationship between the two variables is linear.
  • The data points are paired as (x, y) observations.
  • The variables are measured on an interval or ratio scale.
1

Defining the Correlation Coefficient

Start with the population definition where r is the covariance divided by the product of the standard deviations.

Note: Note that the 1/n terms cancel out during simplification.

2

Expanding the Covariance Term

Expand the brackets and apply the summation to each term, using the property that the sum of the mean is n times the mean.

Note: Recall that n = x and n = y.

3

Simplifying the Covariance Expression

Substitute the definitions of the means (x-bar and y-bar) into the expanded covariance expression to clear the denominators.

Note: This creates the numerator of the final formula.

4

Simplifying the Variance Denominator

Apply the same algebraic expansion to the variance terms for x and y. When substituted back into the denominator, the 'n' factors cancel out.

Note: Ensure you calculate the sum of the squares (sum ) and the square of the sum (sum x)^2 separately to avoid errors.

5

Final Assembly

Combine the simplified numerator and denominator to arrive at the computational formula.

Note: This form is often called the 'computational formula' because it is more efficient for manual calculation.

Result

Source: AQA/Edexcel A-Level Geography Specification - Quantitative Skills: Statistical Analysis

Free formulas

Rearrangements

Solve for

Make r the subject

The formula is already defined with r as the subject.

Difficulty: 1/5

Solve for

Make n the subject

Isolating n requires squaring both sides and using the quadratic formula or variable substitution techniques.

Difficulty: 5/5

Solve for

Make Σxy the subject

Isolate the numerator term by multiplying by the denominator and rearranging.

Difficulty: 3/5

The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.

Visual intuition

Graph

Graph unavailable for this formula.

Contains advanced operator notation (integrals/sums/limits)

Why it behaves this way

Intuition

Think of the data as a cloud of points on a scatter graph. This equation calculates how well those points fit onto a straight line. Imagine trying to draw a 'best-fit' line through the cloud: the numerator measures how much the x and y values 'move together' (covariance), while the denominator acts as a scaling factor (standard deviations) to normalize the result, ensuring the value always sits between -1 and 1 regardless of the units used.

Pearson's Correlation Coefficient
The 'tightness' of the linear relationship; 1 is a perfect line, 0 is a cloud with no discernible slope.
Sample size
The number of pairs of observations; it acts as an 'averaging' agent in the calculation.
Sum of the products
Captures the interaction between variables; if x and y are both high or both low, this sum is large and positive.
Sum of squares
Represents the total 'spread' or variance of each individual variable.

Signs and relationships

  • Numerator (nΣxy - (Σx)(Σy)): If the numerator is positive, x and y increase together (positive correlation). If negative, one increases as the other decreases (negative correlation).
  • Square root in denominator: This forces the result into the -1 to +1 range by dividing the covariance by the product of the two variables' individual standard deviations (normalization).

One free problem

Practice Problem

Given a small sample where n=5, Σx=15, Σy=20, Σxy=70, Σx²=55, and Σy²=90, calculate Pearson's r.

Sample size5
sumXY70
sumX15
sumY20
sumX255
sumY290

Solve for:

Hint: Calculate the numerator first, then the denominator parts separately.

The full worked solution stays in the interactive walkthrough.

Where it shows up

Real-World Context

Investigating the correlation between the distance of settlements from a river (x) and the average annual flood depth (y) to determine flood risk zones.

Study smarter

Tips

  • Always plot a scatter graph first to check for linearity before calculating r.
  • Ensure that your sample size (n) is sufficiently large to avoid skewed results from outliers.
  • Remember that correlation does not imply causation.

Avoid these traps

Common Mistakes

  • Forgetting to square the sum (Σx)² versus summing the squares Σx².
  • Applying the test to non-linear relationships (e.g., exponential growth patterns).
  • Ignoring the impact of extreme outliers which can heavily bias the result.

Common questions

Frequently Asked Questions

The formula is derived from the definition of the correlation coefficient as the covariance of two variables divided by the product of their standard deviations. It simplifies the algebraic expression of the Pearson coefficient for easier computational use.

Use when analyzing two sets of interval or ratio data to determine if a linear trend exists between them.

It allows geographers to move beyond visual inspection of scatter graphs to provide a statistically significant confirmation of relationships between environmental or social variables.

Forgetting to square the sum (Σx)² versus summing the squares Σx². Applying the test to non-linear relationships (e.g., exponential growth patterns). Ignoring the impact of extreme outliers which can heavily bias the result.

Investigating the correlation between the distance of settlements from a river (x) and the average annual flood depth (y) to determine flood risk zones.

Always plot a scatter graph first to check for linearity before calculating r. Ensure that your sample size (n) is sufficiently large to avoid skewed results from outliers. Remember that correlation does not imply causation.

References

Sources

  1. Pearson, K. (1896). Mathematical Contributions to the Theory of Evolution.
  2. Burt, J. E., Barber, G. M., & Rigby, D. L. (2009). Elementary Statistics for Geographers.
  3. AQA/Edexcel A-Level Geography Specification - Quantitative Skills: Statistical Analysis