PsychologyCorrelation, Item Analysis, Psychometrics, Inferential StatisticsUniversity

AQAAPOntarioNSWCBSEGCE O-LevelMoECAPS

Point-Biserial Correlation Coefficient (rpb)

Calculates the point-biserial correlation coefficient, measuring the association between a dichotomous and a continuous variable.

Understand the formulaSee the free derivationOpen the full walkthrough

r_{p b} = \frac{Y ˉ _{1} - Y ˉ _{0}}{s _{Y}} \frac{n _{1} n _{0}}{n ^{2}}

Open Full Walkthrough Try Calculator

This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.

Core idea

Overview

The point-biserial correlation coefficient ($r_{pb}$) is a measure of association used when one variable is dichotomous (binary, e.g., pass/fail, male/female) and the other is continuous (e.g., test score, height). It quantifies the strength and direction of the linear relationship between these two types of variables. Essentially, it assesses whether there's a significant difference in the mean of the continuous variable between the two groups defined by the dichotomous variable. It is mathematically equivalent to Pearson's r when one variable is dichotomous.

When to use: Apply this formula when you want to determine the correlation between a naturally dichotomous variable (e.g., correct/incorrect answer on a test item) and a continuous variable (e.g., total test score). It's commonly used in psychometrics for item analysis to see how well individual test items discriminate between high and low overall performers.

Why it matters: The point-biserial correlation is crucial in educational and psychological testing for evaluating the quality of test items. A high positive $r_{pb}$ for an item indicates that those who scored high on the overall test tended to answer that item correctly, suggesting it's a good discriminator. It helps refine tests, ensuring they effectively measure the intended construct.

Symbols

Variables

$\overset{ˉ}{Y}$ _1 = Mean of Continuous Variable for Group 1, $\overset{ˉ}{Y}$ _0 = Mean of Continuous Variable for Group 0, $s_{Y}$ = Standard Deviation of Continuous Variable (Overall), $n_{1}$ = Sample Size for Group 1, $n_{0}$ = Sample Size for Group 0

\overset{ˉ}{Y}_{1}

Mean of Continuous Variable for Group 1

Variable

\overset{ˉ}{Y}_{0}

Mean of Continuous Variable for Group 0

Variable

s_{Y}

Standard Deviation of Continuous Variable (Overall)

Variable

n_{1}

Sample Size for Group 1

Variable

n_{0}

Sample Size for Group 0

Variable

n

Total Sample Size

Variable

r_{p b}

Point-Biserial Correlation Coefficient

Variable

Walkthrough

Derivation

Formula: Point-Biserial Correlation Coefficient (rpb)

The point-biserial correlation measures the linear relationship between a dichotomous and a continuous variable.

The continuous variable is approximately normally distributed within each of the two groups defined by the dichotomous variable.
The variance of the continuous variable is approximately equal in both groups (homoscedasticity).
The relationship between the variables is linear.

Start with Pearson's r definition:

The point-biserial correlation is a special case of Pearson's product-moment correlation coefficient. We start with its general formula, where X and Y are the two variables.

r = \frac{\sum ( X _{i} - X ˉ ) ( Y _{i} - Y ˉ )}{\sum ( X _{i} - X ˉ ) ^{2} \sum ( Y _{i} - Y ˉ ) ^{2}}

Substitute Dichotomous Variable for X:

Let the dichotomous variable X be coded as 0 and 1. This simplifies the terms involving X in the Pearson's r formula. The mean of X, $\overset{ˉ}{X}$ , becomes $n_{1} / n$ (proportion of group 1).

X_{i} \in {0, 1}

Simplify Terms for Dichotomous X:

When X is dichotomous (0 or 1), the variance of X simplifies to $p (1 - p)$ , where $p = n_{1} / n$ . Thus, $\sum (X_{i} - \overset{ˉ}{X})^{2} = n \cdot p (1 - p) = n \cdot (n_{1} / n) (n_{0} / n) = n_{1} n_{0} / n$ .

\sum (X_{i} - \overset{ˉ}{X})^{2} = n_{1} n_{0} / n

Relate Covariance to Mean Difference:

The numerator, the sum of products, simplifies to $n_{1} (\overset{ˉ}{Y}_{1} - \overset{ˉ}{Y})$ , where $\overset{ˉ}{Y}_{1}$ is the mean of Y for group 1, and $\overset{ˉ}{Y}$ is the overall mean of Y. This term can be further expressed as $n_{1} n_{0} (\overset{ˉ}{Y}_{1} - \overset{ˉ}{Y}_{0}) / n$ .

\sum (X_{i} - \overset{ˉ}{X}) (Y_{i} - \overset{ˉ}{Y}) = n_{1} (\overset{ˉ}{Y}_{1} - \overset{ˉ}{Y})

Combine and Simplify:

Substituting these simplified terms back into Pearson's r formula and performing algebraic simplification leads to the point-biserial formula, where $s_{Y}$ is the overall standard deviation of the continuous variable Y.

r_{p b} = \frac{( Y ˉ _{1} - Y ˉ _{0} )}{s _{Y}} \frac{n _{1} n _{0}}{n ^{2}}

Result

r_{p b} = \frac{( Y ˉ _{1} - Y ˉ _{0} )}{s _{Y}} \frac{n _{1} n _{0}}{n ^{2}}

Source: Gravetter, F. J., & Wallnau, L. B. (2017). Statistics for the Behavioral Sciences (10th ed.). Cengage Learning. Chapter 15: Correlation.

Free formulas

Rearrangements

Solve for $\overset{ˉ}{Y}_{1}$

Point-Biserial Correlation: Make Mean of Group 1 the subject

\overset{ˉ}{Y}_{1} = r_{p b} \frac{s _{Y}}{\frac{n _{1} n _{0}}{n ^{2}}} + \overset{ˉ}{Y}_{0}

To make $\overset{ˉ}{Y}_{1}$ the subject, isolate the term containing it by dividing by the square root term, multiplying by $s_{Y}$ , and then adding $\overset{ˉ}{Y}_{0}$ .

Difficulty: 2/5

Solve for $\overset{ˉ}{Y}_{0}$

Point-Biserial Correlation: Make Mean of Group 0 the subject

\overset{ˉ}{Y}_{0} = \overset{ˉ}{Y}_{1} - r_{p b} \frac{s _{Y}}{\frac{n _{1} n _{0}}{n ^{2}}}

To make $\overset{ˉ}{Y}_{0}$ the subject, isolate the term containing it by dividing by the square root term, multiplying by $s_{Y}$ , and then subtracting the result from $\overset{ˉ}{Y}_{1}$ .

Difficulty: 2/5

Solve for $s_{Y}$

Point-Biserial Correlation: Make Standard Deviation the subject

s_{Y} = \frac{Y ˉ _{1} - Y ˉ _{0}}{r _{p b}} \frac{n _{1} n _{0}}{n ^{2}}

To make $s_{Y}$ the subject, multiply both sides by $s_{Y}$ and then divide by $r_{p b}$ .

Difficulty: 2/5

Solve for $n_{1}$

Point-Biserial Correlation: Make Sample Size for Group 1 the subject

n_{1} = \frac{n ^{2}}{n _{0}} (\frac{r _{p b} s _{Y}}{Y ˉ _{1} - Y ˉ _{0}})^{2}

To make $n_{1}$ the subject, first isolate the square root term, square both sides, and then rearrange to solve for $n_{1}$ given $n_{0}$ and $n$ .

Difficulty: 3/5

Solve for $n_{0}$

Point-Biserial Correlation: Make Sample Size for Group 0 the subject

n_{0} = \frac{n ^{2}}{n _{1}} (\frac{r _{p b} s _{Y}}{Y ˉ _{1} - Y ˉ _{0}})^{2}

To make $n_{0}$ the subject, first isolate the square root term, square both sides, and then rearrange to solve for $n_{0}$ given $n_{1}$ and $n$ .

Difficulty: 3/5

Solve for $n$

Point-Biserial Correlation: Make Total Sample Size the subject

n = \frac{n _{1} n _{0}}{( \frac{r _{p b} s _{Y}}{Y ˉ _{1} - Y ˉ _{0}} ) ^{2}} = \frac{n _{1} n _{0}}{\frac{r _{p b} s _{Y}}{Y ˉ _{1} - Y ˉ _{0}}}

To make $n$ the subject, first isolate the square root term, square both sides, and then rearrange to solve for $n$ given $n_{1}$ and $n_{0}$ .

Difficulty: 3/5

The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.

Visual intuition

Graph

Graph unavailable for this formula.

The graph displays a straight line because the mean of the continuous variable for group one relates to the point-biserial correlation coefficient through a simple linear function. For a psychology student, this means that as the mean of the continuous variable for group one increases relative to group zero, the strength of the association between the two variables grows proportionally. The most important feature of this line is that the slope is determined by the standard deviation and the sample sizes, meaning that the sensitivity of the correlation to changes in the group mean depends entirely on the distribution of the data.

Graph type: linear

Why it behaves this way

Intuition

A statistical picture comparing the separation between the central points (means) of two distinct distributions of a continuous variable, each corresponding to one category of a binary variable, normalized by the overall

r_{p b}

The point-biserial correlation coefficient

A measure of the linear association between a dichotomous variable and a continuous variable. A higher absolute value indicates a stronger linear relationship

\overset{ˉ}{Y}_{1}

The mean of the continuous variable for the first category of the dichotomous variable

Represents the average value of the continuous variable for one specific group defined by the dichotomous variable.

\overset{ˉ}{Y}_{0}

The mean of the continuous variable for the second category of the dichotomous variable

Represents the average value of the continuous variable for the other specific group defined by the dichotomous variable.

s_{Y}

The standard deviation of the continuous variable across the entire sample

Quantifies the overall spread or variability of the continuous scores; a larger spread makes mean differences less pronounced relative to the overall data distribution.

n_{1}

The number of observations in the first category of the dichotomous variable

The count of data points belonging to one group.

n_{0}

The number of observations in the second category of the dichotomous variable

The count of data points belonging to the other group.

n

The total number of observations in the sample (n = n_1 + n_0)

The overall sample size, which affects the reliability of the correlation estimate.

Signs and relationships

\bar{Y}_1 - \bar{Y}_0: The sign of this difference directly determines the sign of $r_{p}$ b. A positive difference means the mean of group 1 is higher than group 0, indicating a positive association. A negative difference indicates the opposite.
s_Y: As a denominator, the standard deviation normalizes the difference between means. A larger overall spread ( $s_{Y}$ ) makes the same mean difference appear less significant, thus reducing the magnitude of $r_{p}$ b, because the
√(\frac{n_1 n_0){n^2}}: This term scales the correlation. It is maximized when the group sizes ( $n_{1}$ and $n_{0}$ ) are roughly equal, indicating a balanced distribution of the dichotomous variable.

Free study cues

Insight

Canonical usage

The point-biserial correlation coefficient is a dimensionless statistic, typically reported as a decimal value between -1 and +1.

Common confusion

A common mistake is to interpret correlation coefficients as percentages or to assign units to them. They are unitless measures of association strength, distinct from proportions or absolute differences.

Dimension note

The point-biserial correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between a dichotomous and a continuous variable.

Unit systems

$\overset{ˉ}{Y}_{1}$ Unit of the continuous variable Y · The mean of the continuous variable for group 1. Its unit and dimension depend on the nature of Y.

$\overset{ˉ}{Y}_{0}$ Unit of the continuous variable Y · The mean of the continuous variable for group 0. Its unit and dimension must be consistent with \bar{Y}_1.

$s_{Y}$ Unit of the continuous variable Y · The standard deviation of the continuous variable for the entire sample. Its unit and dimension must be consistent with \bar{Y}_1 and \bar{Y}_0.

$n_{1}$ dimensionless · The number of observations in group 1, a count.

$n_{0}$ dimensionless · The number of observations in group 0, a count.

$n$ dimensionless · The total number of observations, a count (n = n_1 + n_0).

Ballpark figures

Quantity:

One free problem

Practice Problem

In an item analysis, students who answered a question correctly (Group 1) had a mean score of 75, while those who answered incorrectly (Group 0) had a mean score of 60. The overall standard deviation of scores was 10. There were 30 students in Group 1 and 20 in Group 0, with a total of 50 students. Calculate the point-biserial correlation coefficient ( $r_{p b}$ ).

Y_175

Y_060

Standard Deviation of Continuous Variable (Overall)10

Sample Size for Group 130

Sample Size for Group 020

Total Sample Size50

Solve for: $r_{p} b$

Hint: Calculate the square root term first, then multiply by the difference in means divided by the standard deviation.

The full worked solution stays in the interactive walkthrough.

Where it shows up

Real-World Context

Analyzing if students who answered a specific multiple-choice question correctly (dichotomous) had higher overall exam scores (continuous).

Study smarter

Tips

The dichotomous variable must be truly binary (e.g., 0 or 1).
The continuous variable should be interval or ratio scale.
Values range from -1 to +1, similar to Pearson's r.
A positive $r_{p b}$ means higher scores on the continuous variable are associated with the '1' category of the dichotomous variable.
It's equivalent to Pearson's r if the dichotomous variable is coded as 0 and 1.

Avoid these traps

Common Mistakes

Using it for two continuous variables (use Pearson's r).
Using it for two dichotomous variables (use Phi coefficient).
Misinterpreting the sign of the correlation if the dichotomous variable coding is arbitrary.

Common questions

Frequently Asked Questions

The point-biserial correlation measures the linear relationship between a dichotomous and a continuous variable.

Apply this formula when you want to determine the correlation between a naturally dichotomous variable (e.g., correct/incorrect answer on a test item) and a continuous variable (e.g., total test score). It's commonly used in psychometrics for item analysis to see how well individual test items discriminate between high and low overall performers.

The point-biserial correlation is crucial in educational and psychological testing for evaluating the quality of test items. A high positive $r_{pb}$ for an item indicates that those who scored high on the overall test tended to answer that item correctly, suggesting it's a good discriminator. It helps refine tests, ensuring they effectively measure the intended construct.

Using it for two continuous variables (use Pearson's r). Using it for two dichotomous variables (use Phi coefficient). Misinterpreting the sign of the correlation if the dichotomous variable coding is arbitrary.

Analyzing if students who answered a specific multiple-choice question correctly (dichotomous) had higher overall exam scores (continuous).

The dichotomous variable must be truly binary (e.g., 0 or 1). The continuous variable should be interval or ratio scale. Values range from -1 to +1, similar to Pearson's r. A positive $r_{pb}$ means higher scores on the continuous variable are associated with the '1' category of the dichotomous variable. It's equivalent to Pearson's r if the dichotomous variable is coded as 0 and 1.

References

Sources

Psychometric Theory by Jum C. Nunnally and Ira H. Bernstein
Discovering Statistics Using IBM SPSS Statistics by Andy Field
Wikipedia: Point-biserial correlation coefficient
Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE Publications.
Aron, A., Aron, E. N., & Coups, E. J. (2018). Statistics for Psychology (8th ed.). Pearson.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric Theory (3rd ed.). McGraw-Hill.
Wikipedia: Point-biserial correlation coefficient (Retrieved 2023-10-27).
Gravetter, F. J., & Wallnau, L. B. (2017). Statistics for the Behavioral Sciences (10th ed.). Cengage Learning. Chapter 15: Correlation.

Point-Biserial Correlation Coefficient (rpb)

Overview

Variables

Derivation

Start with Pearson's r definition:

Substitute Dichotomous Variable for X:

Simplify Terms for Dichotomous X:

Relate Covariance to Mean Difference:

Combine and Simplify:

Rearrangements

Graph

Intuition

Insight

Practice Problem

Real-World Context

Tips

Common Mistakes

Related Formulas

Pearson Correlation Coefficient (r)

Phi Coefficient

Independent Samples t-test

Frequently Asked Questions

Sources