Simple Linear Regression Line
This equation defines the line of best fit that minimizes the sum of squared residuals between observed and predicted values for a linear relationship between two variables.
This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.
Core idea
Overview
The regression line is calculated using the Ordinary Least Squares (OLS) method, which seeks to minimize the variance of the errors. The slope, b1, represents the expected change in y per unit change in x, while the intercept, b0, indicates the predicted value of y when x is zero. Together, these parameters characterize the linear trend within a dataset.
When to use: Use this when you need to model the relationship between two continuous variables and predict future outcomes based on linear trends.
Why it matters: It is the foundational tool for predictive analytics, enabling researchers and businesses to forecast trends and quantify the strength of relationships between variables.
Symbols
Variables
y^ = Predicted Value, = Slope, = Y-Intercept, x = Independent Variable, n = Sample Size
Walkthrough
Derivation
Derivation of Simple Linear Regression Line
This derivation uses the Method of Least Squares to minimize the sum of squared residuals between observed data points and the linear regression model.
- The relationship between variables x and y is linear.
- The errors are independent and identically distributed with zero mean.
Define the Sum of Squared Residuals (SSR)
We define the objective function S as the sum of the squares of the vertical distances between each observed data point and the predicted value on the regression line.
Note: Minimizing the squared residuals ensures that positive and negative deviations do not cancel each other out.
Partial Differentiation with respect to b_0
To minimize S, we take the partial derivative with respect to and set it to zero, which leads to the normal equation for the intercept.
Note: Simplifying this results in the equation = - .
Partial Differentiation with respect to b_1
We take the partial derivative with respect to and set it to zero to find the slope that minimizes the error.
Note: Substitute the expression for from the previous step into this equation to isolate .
Solve the System for b_1
By substituting into the second normal equation and solving algebraically, we derive the computational formula for the slope coefficient.
Note: This is equivalent to .
Result
Source: Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis.
Visual intuition
Graph
Graph unavailable for this formula.
Contains advanced operator notation (integrals/sums/limits)
Why it behaves this way
Intuition
Imagine a scatter plot of data points as a cloud of floating particles. The regression line acts like a rigid, weighted stick passing through the center of the cloud. The formula acts as a 'gravity' mechanism that rotates and shifts this stick until the sum of the vertical distances (squared) between the stick and every point in the cloud is at an absolute minimum.
Signs and relationships
- b_1: The sign of indicates the direction of the relationship: positive means both variables move in the same direction, while negative indicates an inverse relationship.
- b_0: This is an additive constant that shifts the entire line vertically, ensuring the line passes through the centroid (mean) of the data.
One free problem
Practice Problem
Given the data points (1, 2), (2, 3), and (3, 5), calculate the slope b1 of the regression line.
Solve for:
Hint: Calculate the numerator n*sum(xy) - sum(x)*sum(y) and the denominator n*sum() - (sum(x))^2 separately.
The full worked solution stays in the interactive walkthrough.
Where it shows up
Real-World Context
An economist uses this equation to model the relationship between marketing spend and total sales revenue to predict how much revenue a specific budget will generate.
Study smarter
Tips
- Always create a scatter plot first to ensure the relationship is actually linear.
- Check for outliers, as they can disproportionately influence the slope of the regression line.
- Calculate the correlation coefficient (r) to quantify the strength and direction of the linear relationship.
Avoid these traps
Common Mistakes
- Assuming that a strong correlation implies causation.
- Extrapolating the regression line far beyond the range of the observed x data.
Common questions
Frequently Asked Questions
This derivation uses the Method of Least Squares to minimize the sum of squared residuals between observed data points and the linear regression model.
Use this when you need to model the relationship between two continuous variables and predict future outcomes based on linear trends.
It is the foundational tool for predictive analytics, enabling researchers and businesses to forecast trends and quantify the strength of relationships between variables.
Assuming that a strong correlation implies causation. Extrapolating the regression line far beyond the range of the observed x data.
An economist uses this equation to model the relationship between marketing spend and total sales revenue to predict how much revenue a specific budget will generate.
Always create a scatter plot first to ensure the relationship is actually linear. Check for outliers, as they can disproportionately influence the slope of the regression line. Calculate the correlation coefficient (r) to quantify the strength and direction of the linear relationship.
References
Sources
- Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis.
- Freedman, D., Pisani, R., & Purves, R. (2007). Statistics.