Quick Recap: How to estimate the prediction error for least squares fits

05 Aug 2025 - tsp
Last update 05 Aug 2025
Reading time 7 mins

When performing a least squares fit, it’s often not enough to simply compute the best-fit curve. In many applications, understanding the uncertainty or error estimates is just as important. This summary provides a short overview of least squares fitting, followed by a detailed breakdown of how to estimate:

A Quick Recap: What Is Least Squares Fitting?

The least squares method is used to determine the parameters of a model function $f(x;\vec{\theta})$ that best fits a set of observed data points ($x_i, y_i$) (i.e. measurements). The goal is to minimize the sum of the squared residuals:

[ \begin{aligned} S(\theta) &= \sum_{i=1}^{n} \left(y_i - f(x; \vec{\theta})\right)^2 \end{aligned} ]

For linear models this leads to a closed-form solution - for example using QR decomposition or normal equations. For nonlinear models iterative solvers such as Levenberg-Marquardt (like the solver in scipys curve_fit method) are employed.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Define the model for fitting
def model(x, a, x0, sigma):
    return a * np.exp(-(x - x0)**2 / (2 * sigma**2))

# A simple method to generate flicker noise (not necessary to understand the fits)
def generate_flicker_noise(size):
    freqs = np.fft.rfftfreq(size)
    amplitudes = np.where(freqs == 0, 0, 1 / np.sqrt(freqs))
    noise_freq = amplitudes * (np.random.normal(size=len(freqs)) + 1j * np.random.normal(size=len(freqs)))
    flicker_noise = np.fft.irfft(noise_freq, n=size)
    return flicker_noise / np.std(flicker_noise)

# Generate noisy data
np.random.seed(0)
x = np.linspace(-5, 5, 100)
y_true = model(x, a=1.0, x0=0.0, sigma=1.0)
y_noise = y_true + np.random.normal(0, 0.05, size=x.shape) + generate_flicker_noise(len(x)) * 0.07

# Perform the fit
popt, _ = curve_fit(model, x, y_noise, p0=[1, 0, 1])
y_fit = model(x, *popt)

# Plot
plt.plot(x, y_noise, 'o', label='Noisy data')
plt.plot(x, y_fit, '-', label='Fitted curve')
plt.legend(); plt.grid(); plt.title("Fitted Gaussian on Noisy Data")
plt.show()

Total Error of the Fit (Fit Quality)

After fitting your model to the data you can assess how well it describes the dataset by computing the residual sum of squares (often only called residual) $r$:

[ \begin{aligned} r &= \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \\ \hat{\sigma}^2 &= \frac{r}{n - p} \end{aligned} ]
# Define model for fitting
def model(x, a, x0, sigma):
    return a * np.exp(-(x - x0)**2 / (2 * sigma**2))

# Fit
popt, _ = curve_fit(model, x, y_noise, p0=[1, 0, 1])
y_fit = model(x, *popt)
residuals = y_noise - y_fit
sigma2_hat = np.sum(residuals**2) / (len(y_noise) - len(popt))
print(f"Sum of squared residuals: {np.sum(residuals**2)}")
print(f"Estimated noise variance (sigma^2): {sigma2_hat}")

For our example this yields the output:

    Sum of squared residuals: 0.5101042811709358
    Estimated noise variance (sigma^2): 0.005258807022380781

Error Estimated for Individual Parameters

To estimate the error of the individual parameters one has to take a look at the covariance matrix. This is built from the Jacobian $J$ of the model function with respect to it’s parameters evaluated at the optimum. The Jacobian describes how sensitive the model is to changes in each parameter at each input value. If parameters influence different parts of the model independently, the Jacobian is close to diagonal. The covariance matrix is defined as

[ \begin{aligned} Cov(\vec{\theta}) &\approx \hat{\sigma}^2 (J^T J)^{-1} \end{aligned} ]

The diagonal elements of the covariance matrix give the variances of the individual parameters:

[ Var(\theta_j) = \lbrack Cov(\vec{\theta}) \rbrack_{jj} ]

Again applying to our example data:

popt, pcov = curve_fit(model, x, y_noise, p0=[1, 0, 1])
param_std = np.sqrt(np.diag(pcov))
print(f"Fitted parameters:")
print(f"\tAmplitude (a):  {popt[0]}")
print(f"\tCenter (x0):    {popt[1]}")
print(f"\tSpread (sigma): {popt[2]}")

print(f"Parameter standard deviations:")
print(f"\tAmplitude (a):  {param_std[0]}")
print(f"\tCenter (x0):    {param_std[1]}")
print(f"\tSpread (sigma): {param_std[2]}")

This yields the following output:

    Fitted parameters:
    	Amplitude (a):  0.9612152760541642
    	Center (x0):    -0.0222346122268849
    	Spread (sigma): 1.0894849516367449
    Parameter standard deviations:
    	Amplitude (a):  0.020313055368936066
    	Center (x0):    0.026585211681826003
    	Spread (sigma): 0.026585212690588656

Prediction Error at Each Input $x$

To understand how certain we are about the model’s output at a specific input $x_i$, we estimate the variance of the prediction:

[ \begin{aligned} Var \lbrack \hat{y}_i \rbrack &= J_i^T Cov(\vec{\theta}) J_i \end{aligned} ]

Here $J_i$ is the gradient (Row of the Jacobian matrix) of the model output with respect to the parameters at $x_i$. The total prediction variance includes model uncertainty and residual noise:

[ \begin{aligned} Var \lbrack y_{pred,i} \rbrack &= \hat{\sigma}^2 + Var \lbrack \hat{y}_i \rbrack \end{aligned} ]
from scipy.optimize import approx_fprime

def compute_prediction_variance(xdata, ydata, model, p0=None):
    popt, pcov = curve_fit(model, xdata, ydata, p0=p0)
    y_fit = model(xdata, *popt)
    residuals = ydata - y_fit
    sigma2_hat = np.sum(residuals**2) / (len(ydata) - len(popt))

    def jacobian_row(xi):
        f = lambda p: model(xi, *p)
        return approx_fprime(popt, f, epsilon=1e-8)

    var_predict = []
    for xi in xdata:
        J = jacobian_row(xi)
        var_model = J @ pcov @ J.T
        var_total = sigma2_hat + var_model
        var_predict.append(var_total)

    return np.array(var_predict), y_fit, popt

# Run the analysis
var_pred, y_fit, popt = compute_prediction_variance(x, y_noise, model)
ci_95 = 1.96 * np.sqrt(var_pred)

# Plot
plt.plot(x, y_noise, 'o', label='Data')
plt.plot(x, y_fit, '-', label='Fit')
plt.fill_between(x, y_fit - ci_95, y_fit + ci_95, alpha=0.3, label='95% Prediction Interval')
plt.legend(); plt.grid(); plt.title("Fitted Gaussian with Prediction Interval")
plt.show()

Alternative: Bootstrapping Parameter and Prediction Errors

While the analytical error propagation using the Jacobian and covariance matrix is fast and insightful, it relies on some key assumptions:

If these assumptions are violated - or if one is using a more complex or black-box model - bootstrapping offers a robust alternative. Bootstrapping is a resampling-based method. The basic idea is:

This way one can calculate:

In addition bootstrapping is very simple to implement. For our above examples:

def bootstrap_fit(x, y, model, p0=None, n_boot=1000):
    params_boot = []
    n = len(x)
    for _ in range(n_boot):
        indices = np.random.choice(n, n, replace=True)
        x_sample = x[indices]
        y_sample = y[indices]
        try:
            popt, _ = curve_fit(model, x_sample, y_sample, p0=p0)
            params_boot.append(popt)
        except:
            continue  # skip failed fits

    return np.array(params_boot)

# Run bootstrap
params_samples = bootstrap_fit(x, y_noise, model, p0=[1, 0, 1])
param_means = np.mean(params_samples, axis=0)
param_stds = np.std(params_samples, axis=0)

for name, mean, std in zip(['a', 'x0', 'sigma'], param_means, param_stds):
    print(f"{name}: {mean:.4f} ± {std:.4f}")

This would yield an output like

a: 0.9613 ± 0.0171
x0: -0.0220 ± 0.0280
sigma: 1.0887 ± 0.0235

As one can see one has generated a set of parameters by taking random samples out of the full dataset and estimating the parameters using the fit via random subsamples. This empirical distribution can then be used to compute statistics such as means, standard deviations, and confidence intervals.

This method doesn’t rely on analytical derivatives and is especially powerful when dealing with complex models or when residual analysis shows non-Gaussian behavior.

Conclusion

Least squares fitting is only the first step in model analysis. To fully understand your results, you must:

These tools allow you to produce confidence bands, prediction intervals, and well-founded error bars, critical for scientific and engineering analysis.

References

Note: Links to books are Amazon affiliate links, this pages author profits from qualified purchases

This article is tagged: Statistics, School math, Measurements, Python, Tutorial, Basics


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support