Linear Regression | Tools Hub

Data Import

Drop file here or click to browse

CSV, Excel (.xlsx, .xls), or paste from Excel

Load Example Dataset

Demonstrates multiple regression with real-world relationships between price, square footage, bedrooms, and age.

Or paste data from Excel

Already have a model?

Variables

Import data to select variables

Regression Method

Professional Use Disclaimer

FOR EDUCATIONAL & RESEARCH PURPOSES. This tool provides statistical calculations using Ordinary Least Squares (OLS) regression. Results should be validated using professional statistical software.

ASSUMPTIONS NOT CHECKED. This tool does not verify linearity, normality, homoscedasticity, independence, or multicollinearity. Violating these assumptions may invalidate results.

NO PROFESSIONAL ADVICE. Consult a statistician for research design, hypothesis testing, and interpretation. This tool is not a substitute for professional statistical analysis.

LIMITATION OF LIABILITY. The creators assume NO LIABILITY for errors or outcomes. Users must validate all calculations and interpretations.

Statistics

R²

Adj R²

RMSE

MAE

F-Stat

p-value

AIC

BIC

Log-Likelihood

Regression Equation

y = β₀ + β₁x₁ + ... + βₖxₖ

Scatter Plot

Scatter Plot with Regression Line

Residuals vs Fitted Values

Normal Q-Q Plot (Residuals)

Residuals vs Leverage

Coefficient Plot (with 95% CI)

Coefficient Path (Regularization Trace)

Coefficients

Variable	Coefficient	Std Error	t-stat	p-value	95% CI Lower	95% CI Upper

Feature Importance Analysis

Analyze which predictors contribute most to the model using standardized coefficients, SHAP values, VIF ranking, and permutation importance.

Run OLS or WLS regression with multiple predictors to see feature importance analysis

Diagnostic Tests

Tests for regression assumptions (linearity, homoscedasticity, normality, etc.)

Run regression to see diagnostic test results

Cross-Validation Results

Enable cross-validation and run regression to see results

Residuals Table

#	Actual ↕	Predicted ↕	Residual ↕	Std. Residual ↕	Leverage ↕

Interactive "What-If" Predictor

Allow Extrapolation

Adjust predictor values to see real-time model predictions.

Predicted ${STATE.yVariable || 'Result'}

---

What is Linear Regression?

Click to expand explanation of metrics and concepts

What is Linear Regression?

Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). It finds the best-fitting straight line (or hyperplane for multiple predictors) that minimizes the sum of squared differences between observed and predicted values.

Simple regression: One predictor (y = mx + b)
Multiple regression: Multiple predictors (y = b + m₁x₁ + m₂x₂ + ... + mₖxₖ)

Model Fit Statistics

R² (R-squared): Proportion of variance in Y explained by the model. Range: 0 to 1. Higher = better fit.
Adjusted R²: R² adjusted for the number of predictors. Penalizes adding useless variables. Use this when comparing models with different numbers of predictors.
F-statistic: Tests whether the model as a whole is significant. Compares your model to a model with no predictors.
p-value (Model): Probability of getting these results if no variables actually predict Y. < 0.05 means the model is statistically significant.

Coefficient Statistics

Coefficient (m or b): The estimated effect of each variable on Y. For continuous X: change in Y for a 1-unit increase in X.
Std Error: Precision of the coefficient estimate. Smaller = more precise.
t-stat: Coefficient divided by its standard error. Larger absolute values indicate more significant predictors.
p-value (Coefficient): Tests if the coefficient is significantly different from zero. < 0.05 means the variable contributes to predicting Y.
95% CI: 95% Confidence Interval. We're 95% confident the true coefficient lies within this range.

Residuals Diagnostics

Residual: Difference between actual and predicted values (Y - Ŷ). Should be randomly scattered around zero.
Standardized Residual: Residual divided by its standard deviation. Values > 2 or < -2 may indicate outliers.
MSE (Mean Squared Error): Average squared difference between observed and predicted values. Lower = better predictions.
Standard Error: Standard deviation of the residuals. Typical prediction error in same units as Y.

Significance Codes

*** p < 0.001: Very strong evidence against null hypothesis
** p < 0.01: Strong evidence against null hypothesis
* p < 0.05: Moderate evidence against null hypothesis
(no stars): Not statistically significant at p < 0.05 level

Confidence Band vs Prediction Interval

Confidence Band (Mean Response): The shaded region on the scatter plot shows where the mean of Y is likely to fall for a given X. Narrower band = more precise estimate of the regression line. Use this for estimating average values.
Prediction Interval (Individual Values): The dashed amber band on the scatter plot shows where an individual observation is likely to fall. Wider than the CI because it accounts for both uncertainty in the regression line AND the natural variation of individual data points. Computed via WASM using the Rust prediction intervals engine.

Assumptions of Linear Regression

Linearity: Relationship between X and Y is linear
Independence: Observations are independent of each other
Homoscedasticity: Residuals have constant variance at all levels of X
Normality: Residuals are approximately normally distributed
No multicollinearity: Predictors are not highly correlated with each other

Multicollinearity & VIF Explained

Multicollinearity occurs when predictor variables are highly correlated with each other. This can make it difficult to determine the individual effect of each predictor.

VIF (Variance Inflation Factor) measures how much the variance of a coefficient is inflated due to multicollinearity:

VIF = 1: No correlation
1 < VIF ≤ 5: Low multicollinearity (acceptable)
5 < VIF ≤ 10: Moderate multicollinearity (review variables)
VIF > 10: High multicollinearity (consider removing redundant variables)
VIF = ∞: Perfect multicollinearity (one variable is a linear combination of others)

What to do: If VIF is high, consider removing one of the correlated variables or combining them into a single predictor.

Third-Party Licenses

Data Import

What is Linear Regression?

Model Fit Statistics

Coefficient Statistics

Residuals Diagnostics

Significance Codes

Confidence Band vs Prediction Interval

Assumptions of Linear Regression

Multicollinearity & VIF Explained

Select Sheet

Observation Details

Model Comparison