Quick Recap: How to estimate the prediction error for least squares fits

05 Aug 2025 - tsp
Last update 05 Aug 2025
Reading time 7 mins

When performing a least squares fit, it’s often not enough to simply compute the best-fit curve. In many applications, understanding the uncertainty or error estimates is just as important. This summary provides a short overview of least squares fitting, followed by a detailed breakdown of how to estimate:

The total error of the fit (overall fit quality)
The uncertainty in the fitted parameters
The prediction error at each point in the domain
A Quick Recap: What Is Least Squares Fitting?
Total Error of the Fit (Fit Quality)
Error Estimated for Individual Parameters
Prediction Error at Each Input x
Alternative: Bootstrapping Parameter and Prediction Errors
Conclusion
References

A Quick Recap: What Is Least Squares Fitting?

The least squares method is used to determine the parameters of a model function $f(x;\vec{\theta})$ that best fits a set of observed data points ($x_i, y_i$) (i.e. measurements). The goal is to minimize the sum of the squared residuals:

[ \begin{aligned} S(\theta) &= \sum_{i=1}^{n} \left(y_i - f(x; \vec{\theta})\right)^2 \end{aligned} ]

For linear models this leads to a closed-form solution - for example using QR decomposition or normal equations. For nonlinear models iterative solvers such as Levenberg-Marquardt (like the solver in scipys curve_fit method) are employed.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Define the model for fitting
def model(x, a, x0, sigma):
    return a * np.exp(-(x - x0)**2 / (2 * sigma**2))

# A simple method to generate flicker noise (not necessary to understand the fits)
def generate_flicker_noise(size):
    freqs = np.fft.rfftfreq(size)
    amplitudes = np.where(freqs == 0, 0, 1 / np.sqrt(freqs))
    noise_freq = amplitudes * (np.random.normal(size=len(freqs)) + 1j * np.random.normal(size=len(freqs)))
    flicker_noise = np.fft.irfft(noise_freq, n=size)
    return flicker_noise / np.std(flicker_noise)

# Generate noisy data
np.random.seed(0)
x = np.linspace(-5, 5, 100)
y_true = model(x, a=1.0, x0=0.0, sigma=1.0)
y_noise = y_true + np.random.normal(0, 0.05, size=x.shape) + generate_flicker_noise(len(x)) * 0.07

# Perform the fit
popt, _ = curve_fit(model, x, y_noise, p0=[1, 0, 1])
y_fit = model(x, *popt)

# Plot
plt.plot(x, y_noise, 'o', label='Noisy data')
plt.plot(x, y_fit, '-', label='Fitted curve')
plt.legend(); plt.grid(); plt.title("Fitted Gaussian on Noisy Data")
plt.show()

Total Error of the Fit (Fit Quality)

After fitting your model to the data you can assess how well it describes the dataset by computing the residual sum of squares (often only called residual) $r$:

[ \begin{aligned} r &= \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \\ \hat{\sigma}^2 &= \frac{r}{n - p} \end{aligned} ]

$n$ is the number of data points
$y_i$ is the $i$’th data point (measured)
$p$ is the number of fitted parameters
$\hat{y}_i$ is the prediction of $y_i$ at $x_i$

# Define model for fitting
def model(x, a, x0, sigma):
    return a * np.exp(-(x - x0)**2 / (2 * sigma**2))

# Fit
popt, _ = curve_fit(model, x, y_noise, p0=[1, 0, 1])
y_fit = model(x, *popt)
residuals = y_noise - y_fit
sigma2_hat = np.sum(residuals**2) / (len(y_noise) - len(popt))
print(f"Sum of squared residuals: {np.sum(residuals**2)}")
print(f"Estimated noise variance (sigma^2): {sigma2_hat}")

For our example this yields the output:

    Sum of squared residuals: 0.5101042811709358
    Estimated noise variance (sigma^2): 0.005258807022380781

Error Estimated for Individual Parameters

To estimate the error of the individual parameters one has to take a look at the covariance matrix. This is built from the Jacobian $J$ of the model function with respect to it’s parameters evaluated at the optimum. The Jacobian describes how sensitive the model is to changes in each parameter at each input value. If parameters influence different parts of the model independently, the Jacobian is close to diagonal. The covariance matrix is defined as

[ \begin{aligned} Cov(\vec{\theta}) &\approx \hat{\sigma}^2 (J^T J)^{-1} \end{aligned} ]

The diagonal elements of the covariance matrix give the variances of the individual parameters:

[ Var(\theta_j) = \lbrack Cov(\vec{\theta}) \rbrack_{jj} ]

Again applying to our example data:

popt, pcov = curve_fit(model, x, y_noise, p0=[1, 0, 1])
param_std = np.sqrt(np.diag(pcov))
print(f"Fitted parameters:")
print(f"\tAmplitude (a):  {popt[0]}")
print(f"\tCenter (x0):    {popt[1]}")
print(f"\tSpread (sigma): {popt[2]}")

print(f"Parameter standard deviations:")
print(f"\tAmplitude (a):  {param_std[0]}")
print(f"\tCenter (x0):    {param_std[1]}")
print(f"\tSpread (sigma): {param_std[2]}")

This yields the following output:

    Fitted parameters:
    	Amplitude (a):  0.9612152760541642
    	Center (x0):    -0.0222346122268849
    	Spread (sigma): 1.0894849516367449
    Parameter standard deviations:
    	Amplitude (a):  0.020313055368936066
    	Center (x0):    0.026585211681826003
    	Spread (sigma): 0.026585212690588656

Prediction Error at Each Input $x$

To understand how certain we are about the model’s output at a specific input $x_i$, we estimate the variance of the prediction:

[ \begin{aligned} Var \lbrack \hat{y}_i \rbrack &= J_i^T Cov(\vec{\theta}) J_i \end{aligned} ]

Here $J_i$ is the gradient (Row of the Jacobian matrix) of the model output with respect to the parameters at $x_i$. The total prediction variance includes model uncertainty and residual noise:

[ \begin{aligned} Var \lbrack y_{pred,i} \rbrack &= \hat{\sigma}^2 + Var \lbrack \hat{y}_i \rbrack \end{aligned} ]

from scipy.optimize import approx_fprime

def compute_prediction_variance(xdata, ydata, model, p0=None):
    popt, pcov = curve_fit(model, xdata, ydata, p0=p0)
    y_fit = model(xdata, *popt)
    residuals = ydata - y_fit
    sigma2_hat = np.sum(residuals**2) / (len(ydata) - len(popt))

    def jacobian_row(xi):
        f = lambda p: model(xi, *p)
        return approx_fprime(popt, f, epsilon=1e-8)

    var_predict = []
    for xi in xdata:
        J = jacobian_row(xi)
        var_model = J @ pcov @ J.T
        var_total = sigma2_hat + var_model
        var_predict.append(var_total)

    return np.array(var_predict), y_fit, popt

# Run the analysis
var_pred, y_fit, popt = compute_prediction_variance(x, y_noise, model)
ci_95 = 1.96 * np.sqrt(var_pred)

# Plot
plt.plot(x, y_noise, 'o', label='Data')
plt.plot(x, y_fit, '-', label='Fit')
plt.fill_between(x, y_fit - ci_95, y_fit + ci_95, alpha=0.3, label='95% Prediction Interval')
plt.legend(); plt.grid(); plt.title("Fitted Gaussian with Prediction Interval")
plt.show()

Alternative: Bootstrapping Parameter and Prediction Errors

While the analytical error propagation using the Jacobian and covariance matrix is fast and insightful, it relies on some key assumptions:

The model is locally linear in parameters near the optimum.
The residuals are approximately independent and normally distributed with constant variance.

If these assumptions are violated - or if one is using a more complex or black-box model - bootstrapping offers a robust alternative. Bootstrapping is a resampling-based method. The basic idea is:

Generate many new datasets by sampling the original data with replacement.
Fit the model to each of these synthetic datasets.
Collect the resulting parameters or predictions to build empirical distributions.

This way one can calculate:

Standard deviations and confidence intervals for each parameter
Uncertainty bands for predictions - even when analytical derivatives are hard to compute

In addition bootstrapping is very simple to implement. For our above examples:

def bootstrap_fit(x, y, model, p0=None, n_boot=1000):
    params_boot = []
    n = len(x)
    for _ in range(n_boot):
        indices = np.random.choice(n, n, replace=True)
        x_sample = x[indices]
        y_sample = y[indices]
        try:
            popt, _ = curve_fit(model, x_sample, y_sample, p0=p0)
            params_boot.append(popt)
        except:
            continue  # skip failed fits

    return np.array(params_boot)

# Run bootstrap
params_samples = bootstrap_fit(x, y_noise, model, p0=[1, 0, 1])
param_means = np.mean(params_samples, axis=0)
param_stds = np.std(params_samples, axis=0)

for name, mean, std in zip(['a', 'x0', 'sigma'], param_means, param_stds):
    print(f"{name}: {mean:.4f} ± {std:.4f}")

This would yield an output like

a: 0.9613 ± 0.0171
x0: -0.0220 ± 0.0280
sigma: 1.0887 ± 0.0235

As one can see one has generated a set of parameters by taking random samples out of the full dataset and estimating the parameters using the fit via random subsamples. This empirical distribution can then be used to compute statistics such as means, standard deviations, and confidence intervals.

This method doesn’t rely on analytical derivatives and is especially powerful when dealing with complex models or when residual analysis shows non-Gaussian behavior.

Conclusion

Least squares fitting is only the first step in model analysis. To fully understand your results, you must:

Estimate the residual noise to understand the fit quality
Use the covariance matrix to assess uncertainty in fitted parameters
Propagate parameter uncertainty to compute prediction intervals

These tools allow you to produce confidence bands, prediction intervals, and well-founded error bars, critical for scientific and engineering analysis.

References

Note: Links to books are Amazon affiliate links, this pages author profits from qualified purchases

P. Bruce, A. Bruce, and P. Gedeck, Practical statistics for data scientists: 50+ essential concepts using R and Python. O’Reilly Media, 2020.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical recipes 3rd edition. Cambridge: New York, 2007.

Quick Recap: How to estimate the prediction error for least squares fits

A Quick Recap: What Is Least Squares Fitting?

Total Error of the Fit (Fit Quality)

Error Estimated for Individual Parameters

Prediction Error at Each Input $x$

Alternative: Bootstrapping Parameter and Prediction Errors

Conclusion

References

Related articles

A mini recipe to fit arbitrary models to data using lmfit in Python

Short Kalman filter summary

Statistics 101

Neuronal network basics

The most simple Difference of Gaussian image pyramid with OpenCL

Some simple image filters by convolution in Python and OpenCL

Some Gaussian integrals in 1 dimension

Linear time dependent correlations using bivariate correlation and shifts

Also on this blog

3D printing NACA airfoil profiles

Wire cross-sections

Using the CP2102N USB to UART bridge

Mini summary on how to solve linear equations in Krylov subspaces