Point-Biserial Correlation Coefficient (rpb)
Calculates the point-biserial correlation coefficient, measuring the association between a dichotomous and a continuous variable.
This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.
Core idea
Overview
The point-biserial correlation coefficient ($r_{pb}$) is a measure of association used when one variable is dichotomous (binary, e.g., pass/fail, male/female) and the other is continuous (e.g., test score, height). It quantifies the strength and direction of the linear relationship between these two types of variables. Essentially, it assesses whether there's a significant difference in the mean of the continuous variable between the two groups defined by the dichotomous variable. It is mathematically equivalent to Pearson's r when one variable is dichotomous.
When to use: Apply this formula when you want to determine the correlation between a naturally dichotomous variable (e.g., correct/incorrect answer on a test item) and a continuous variable (e.g., total test score). It's commonly used in psychometrics for item analysis to see how well individual test items discriminate between high and low overall performers.
Why it matters: The point-biserial correlation is crucial in educational and psychological testing for evaluating the quality of test items. A high positive $r_{pb}$ for an item indicates that those who scored high on the overall test tended to answer that item correctly, suggesting it's a good discriminator. It helps refine tests, ensuring they effectively measure the intended construct.
Symbols
Variables
_1 = Mean of Continuous Variable for Group 1, _0 = Mean of Continuous Variable for Group 0, = Standard Deviation of Continuous Variable (Overall), = Sample Size for Group 1, = Sample Size for Group 0
Walkthrough
Derivation
Formula: Point-Biserial Correlation Coefficient (rpb)
The point-biserial correlation measures the linear relationship between a dichotomous and a continuous variable.
- The continuous variable is approximately normally distributed within each of the two groups defined by the dichotomous variable.
- The variance of the continuous variable is approximately equal in both groups (homoscedasticity).
- The relationship between the variables is linear.
Start with Pearson's r definition:
The point-biserial correlation is a special case of Pearson's product-moment correlation coefficient. We start with its general formula, where X and Y are the two variables.
Substitute Dichotomous Variable for X:
Let the dichotomous variable X be coded as 0 and 1. This simplifies the terms involving X in the Pearson's r formula. The mean of X, , becomes (proportion of group 1).
Simplify Terms for Dichotomous X:
When X is dichotomous (0 or 1), the variance of X simplifies to , where . Thus, .
Relate Covariance to Mean Difference:
The numerator, the sum of products, simplifies to , where is the mean of Y for group 1, and is the overall mean of Y. This term can be further expressed as .
Combine and Simplify:
Substituting these simplified terms back into Pearson's r formula and performing algebraic simplification leads to the point-biserial formula, where is the overall standard deviation of the continuous variable Y.
Result
Source: Gravetter, F. J., & Wallnau, L. B. (2017). Statistics for the Behavioral Sciences (10th ed.). Cengage Learning. Chapter 15: Correlation.
Free formulas
Rearrangements
Solve for
Point-Biserial Correlation: Make Mean of Group 1 the subject
To make the subject, isolate the term containing it by dividing by the square root term, multiplying by , and then adding .
Difficulty: 2/5
Solve for
Point-Biserial Correlation: Make Mean of Group 0 the subject
To make the subject, isolate the term containing it by dividing by the square root term, multiplying by , and then subtracting the result from .
Difficulty: 2/5
Solve for
Point-Biserial Correlation: Make Standard Deviation the subject
To make the subject, multiply both sides by and then divide by .
Difficulty: 2/5
Solve for
Point-Biserial Correlation: Make Sample Size for Group 1 the subject
To make the subject, first isolate the square root term, square both sides, and then rearrange to solve for given and .
Difficulty: 3/5
Solve for
Point-Biserial Correlation: Make Sample Size for Group 0 the subject
To make the subject, first isolate the square root term, square both sides, and then rearrange to solve for given and .
Difficulty: 3/5
Solve for
Point-Biserial Correlation: Make Total Sample Size the subject
To make the subject, first isolate the square root term, square both sides, and then rearrange to solve for given and .
Difficulty: 3/5
The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.
Visual intuition
Graph
Graph unavailable for this formula.
The graph displays a straight line because the mean of the continuous variable for group one relates to the point-biserial correlation coefficient through a simple linear function. For a psychology student, this means that as the mean of the continuous variable for group one increases relative to group zero, the strength of the association between the two variables grows proportionally. The most important feature of this line is that the slope is determined by the standard deviation and the sample sizes, meaning that the sensitivity of the correlation to changes in the group mean depends entirely on the distribution of the data.
Graph type: linear
Why it behaves this way
Intuition
A statistical picture comparing the separation between the central points (means) of two distinct distributions of a continuous variable, each corresponding to one category of a binary variable, normalized by the overall
Signs and relationships
- \bar{Y}_1 - \bar{Y}_0: The sign of this difference directly determines the sign of b. A positive difference means the mean of group 1 is higher than group 0, indicating a positive association. A negative difference indicates the opposite.
- s_Y: As a denominator, the standard deviation normalizes the difference between means. A larger overall spread () makes the same mean difference appear less significant, thus reducing the magnitude of b, because the
- √(\frac{n_1 n_0){n^2}}: This term scales the correlation. It is maximized when the group sizes ( and ) are roughly equal, indicating a balanced distribution of the dichotomous variable.
Free study cues
Insight
Canonical usage
The point-biserial correlation coefficient is a dimensionless statistic, typically reported as a decimal value between -1 and +1.
Common confusion
A common mistake is to interpret correlation coefficients as percentages or to assign units to them. They are unitless measures of association strength, distinct from proportions or absolute differences.
Dimension note
The point-biserial correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between a dichotomous and a continuous variable.
Unit systems
Ballpark figures
- Quantity:
One free problem
Practice Problem
In an item analysis, students who answered a question correctly (Group 1) had a mean score of 75, while those who answered incorrectly (Group 0) had a mean score of 60. The overall standard deviation of scores was 10. There were 30 students in Group 1 and 20 in Group 0, with a total of 50 students. Calculate the point-biserial correlation coefficient ().
Solve for:
Hint: Calculate the square root term first, then multiply by the difference in means divided by the standard deviation.
The full worked solution stays in the interactive walkthrough.
Where it shows up
Real-World Context
Analyzing if students who answered a specific multiple-choice question correctly (dichotomous) had higher overall exam scores (continuous).
Study smarter
Tips
- The dichotomous variable must be truly binary (e.g., 0 or 1).
- The continuous variable should be interval or ratio scale.
- Values range from -1 to +1, similar to Pearson's r.
- A positive means higher scores on the continuous variable are associated with the '1' category of the dichotomous variable.
- It's equivalent to Pearson's r if the dichotomous variable is coded as 0 and 1.
Avoid these traps
Common Mistakes
- Using it for two continuous variables (use Pearson's r).
- Using it for two dichotomous variables (use Phi coefficient).
- Misinterpreting the sign of the correlation if the dichotomous variable coding is arbitrary.
Common questions
Frequently Asked Questions
The point-biserial correlation measures the linear relationship between a dichotomous and a continuous variable.
Apply this formula when you want to determine the correlation between a naturally dichotomous variable (e.g., correct/incorrect answer on a test item) and a continuous variable (e.g., total test score). It's commonly used in psychometrics for item analysis to see how well individual test items discriminate between high and low overall performers.
The point-biserial correlation is crucial in educational and psychological testing for evaluating the quality of test items. A high positive $r_{pb}$ for an item indicates that those who scored high on the overall test tended to answer that item correctly, suggesting it's a good discriminator. It helps refine tests, ensuring they effectively measure the intended construct.
Using it for two continuous variables (use Pearson's r). Using it for two dichotomous variables (use Phi coefficient). Misinterpreting the sign of the correlation if the dichotomous variable coding is arbitrary.
Analyzing if students who answered a specific multiple-choice question correctly (dichotomous) had higher overall exam scores (continuous).
The dichotomous variable must be truly binary (e.g., 0 or 1). The continuous variable should be interval or ratio scale. Values range from -1 to +1, similar to Pearson's r. A positive $r_{pb}$ means higher scores on the continuous variable are associated with the '1' category of the dichotomous variable. It's equivalent to Pearson's r if the dichotomous variable is coded as 0 and 1.
References
Sources
- Psychometric Theory by Jum C. Nunnally and Ira H. Bernstein
- Discovering Statistics Using IBM SPSS Statistics by Andy Field
- Wikipedia: Point-biserial correlation coefficient
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE Publications.
- Aron, A., Aron, E. N., & Coups, E. J. (2018). Statistics for Psychology (8th ed.). Pearson.
- Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric Theory (3rd ed.). McGraw-Hill.
- Wikipedia: Point-biserial correlation coefficient (Retrieved 2023-10-27).
- Gravetter, F. J., & Wallnau, L. B. (2017). Statistics for the Behavioral Sciences (10th ed.). Cengage Learning. Chapter 15: Correlation.