Data Import
Drop file here or click to browse
CSV, Excel (.xlsx, .xls), or paste from Excel
Demonstrates multiple regression with real-world relationships between price, square footage, bedrooms, and age.
Showing first 10 rows of 0 total rows
Visualizes relationships between all numeric variables. Values range from -1 (strong negative) to +1 (strong positive).
| Variable | Coefficient | Std Error | t-stat | p-value | 95% CI Lower | 95% CI Upper |
|---|
Analyze which predictors contribute most to the model using standardized coefficients, SHAP values, VIF ranking, and permutation importance.
Run OLS or WLS regression with multiple predictors to see feature importance analysis
Tests for regression assumptions (linearity, homoscedasticity, normality, etc.)
Note: Diagnostic tests analyze residuals and can be run after any regression method.
Run regression to see diagnostic test results
Enable cross-validation and run regression to see results
| # | Actual ↕ | Predicted ↕ | Residual ↕ | Std. Residual ↕ | Leverage ↕ |
|---|
Adjust predictor values to see real-time model predictions.
What is Linear Regression?
Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). It finds the best-fitting straight line (or hyperplane for multiple predictors) that minimizes the sum of squared differences between observed and predicted values.
Simple regression: One predictor (y = mx + b)
Multiple regression: Multiple predictors (y = b + m₁x₁ + m₂x₂ + ... + mₖxₖ)
Model Fit Statistics
- R² (R-squared)
- Proportion of variance in Y explained by the model. Range: 0 to 1. Higher = better fit.
- Adjusted R²
- R² adjusted for the number of predictors. Penalizes adding useless variables. Use this when comparing models with different numbers of predictors.
- F-statistic
- Tests whether the model as a whole is significant. Compares your model to a model with no predictors.
- p-value (Model)
- Probability of getting these results if no variables actually predict Y. < 0.05 means the model is statistically significant.
Coefficient Statistics
- Coefficient (m or b)
- The estimated effect of each variable on Y. For continuous X: change in Y for a 1-unit increase in X.
- Std Error
- Precision of the coefficient estimate. Smaller = more precise.
- t-stat
- Coefficient divided by its standard error. Larger absolute values indicate more significant predictors.
- p-value (Coefficient)
- Tests if the coefficient is significantly different from zero. < 0.05 means the variable contributes to predicting Y.
- 95% CI
- 95% Confidence Interval. We're 95% confident the true coefficient lies within this range.
Residuals Diagnostics
- Residual
- Difference between actual and predicted values (Y - Ŷ). Should be randomly scattered around zero.
- Standardized Residual
- Residual divided by its standard deviation. Values > 2 or < -2 may indicate outliers.
- MSE (Mean Squared Error)
- Average squared difference between observed and predicted values. Lower = better predictions.
- Standard Error
- Standard deviation of the residuals. Typical prediction error in same units as Y.
Significance Codes
- *** p < 0.001
- Very strong evidence against null hypothesis
- ** p < 0.01
- Strong evidence against null hypothesis
- * p < 0.05
- Moderate evidence against null hypothesis
- (no stars)
- Not statistically significant at p < 0.05 level
Confidence Band vs Prediction Interval
- Confidence Band (Mean Response)
- The shaded region on the scatter plot shows where the mean of Y is likely to fall for a given X. Narrower band = more precise estimate of the regression line. Use this for estimating average values.
- Prediction Interval (Individual Values)
- The dashed amber band on the scatter plot shows where an individual observation is likely to fall. Wider than the CI because it accounts for both uncertainty in the regression line AND the natural variation of individual data points. Computed via WASM using the Rust prediction intervals engine.
Assumptions of Linear Regression
- Linearity: Relationship between X and Y is linear
- Independence: Observations are independent of each other
- Homoscedasticity: Residuals have constant variance at all levels of X
- Normality: Residuals are approximately normally distributed
- No multicollinearity: Predictors are not highly correlated with each other
Multicollinearity & VIF Explained
Multicollinearity occurs when predictor variables are highly correlated with each other. This can make it difficult to determine the individual effect of each predictor.
VIF (Variance Inflation Factor) measures how much the variance of a coefficient is inflated due to multicollinearity:
- VIF = 1: No correlation
- 1 < VIF ≤ 5: Low multicollinearity (acceptable)
- 5 < VIF ≤ 10: Moderate multicollinearity (review variables)
- VIF > 10: High multicollinearity (consider removing redundant variables)
- VIF = ∞: Perfect multicollinearity (one variable is a linear combination of others)
What to do: If VIF is high, consider removing one of the correlated variables or combining them into a single predictor.
- Chart.js — MIT License © Chart.js Contributors
- SheetJS (XLSX) — Apache 2.0 License © SheetJS LLC
- linreg-core — MIT OR Apache-2.0 License © Jesse Anderson