05 Aug 2025 - tsp
Last update 05 Aug 2025
7 mins
When performing a least squares fit, it’s often not enough to simply compute the best-fit curve. In many applications, understanding the uncertainty or error estimates is just as important. This summary provides a short overview of least squares fitting, followed by a detailed breakdown of how to estimate:
The prediction error at each point in the domain
The least squares method is used to determine the parameters of a model function $f(x;\vec{\theta})$ that best fits a set of observed data points ($x_i, y_i$) (i.e. measurements). The goal is to minimize the sum of the squared residuals:
[ \begin{aligned} S(\theta) &= \sum_{i=1}^{n} \left(y_i - f(x; \vec{\theta})\right)^2 \end{aligned} ]For linear models this leads to a closed-form solution - for example using QR decomposition or normal equations. For nonlinear models iterative solvers such as Levenberg-Marquardt (like the solver in scipy
s curve_fit
method) are employed.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Define the model for fitting
def model(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
# A simple method to generate flicker noise (not necessary to understand the fits)
def generate_flicker_noise(size):
freqs = np.fft.rfftfreq(size)
amplitudes = np.where(freqs == 0, 0, 1 / np.sqrt(freqs))
noise_freq = amplitudes * (np.random.normal(size=len(freqs)) + 1j * np.random.normal(size=len(freqs)))
flicker_noise = np.fft.irfft(noise_freq, n=size)
return flicker_noise / np.std(flicker_noise)
# Generate noisy data
np.random.seed(0)
x = np.linspace(-5, 5, 100)
y_true = model(x, a=1.0, x0=0.0, sigma=1.0)
y_noise = y_true + np.random.normal(0, 0.05, size=x.shape) + generate_flicker_noise(len(x)) * 0.07
# Perform the fit
popt, _ = curve_fit(model, x, y_noise, p0=[1, 0, 1])
y_fit = model(x, *popt)
# Plot
plt.plot(x, y_noise, 'o', label='Noisy data')
plt.plot(x, y_fit, '-', label='Fitted curve')
plt.legend(); plt.grid(); plt.title("Fitted Gaussian on Noisy Data")
plt.show()
After fitting your model to the data you can assess how well it describes the dataset by computing the residual sum of squares (often only called residual) $r$:
[ \begin{aligned} r &= \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \\ \hat{\sigma}^2 &= \frac{r}{n - p} \end{aligned} ]# Define model for fitting
def model(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
# Fit
popt, _ = curve_fit(model, x, y_noise, p0=[1, 0, 1])
y_fit = model(x, *popt)
residuals = y_noise - y_fit
sigma2_hat = np.sum(residuals**2) / (len(y_noise) - len(popt))
print(f"Sum of squared residuals: {np.sum(residuals**2)}")
print(f"Estimated noise variance (sigma^2): {sigma2_hat}")
For our example this yields the output:
Sum of squared residuals: 0.5101042811709358
Estimated noise variance (sigma^2): 0.005258807022380781
To estimate the error of the individual parameters one has to take a look at the covariance matrix. This is built from the Jacobian $J$ of the model function with respect to it’s parameters evaluated at the optimum. The Jacobian describes how sensitive the model is to changes in each parameter at each input value. If parameters influence different parts of the model independently, the Jacobian is close to diagonal. The covariance matrix is defined as
[ \begin{aligned} Cov(\vec{\theta}) &\approx \hat{\sigma}^2 (J^T J)^{-1} \end{aligned} ]The diagonal elements of the covariance matrix give the variances of the individual parameters:
[ Var(\theta_j) = \lbrack Cov(\vec{\theta}) \rbrack_{jj} ]Again applying to our example data:
popt, pcov = curve_fit(model, x, y_noise, p0=[1, 0, 1])
param_std = np.sqrt(np.diag(pcov))
print(f"Fitted parameters:")
print(f"\tAmplitude (a): {popt[0]}")
print(f"\tCenter (x0): {popt[1]}")
print(f"\tSpread (sigma): {popt[2]}")
print(f"Parameter standard deviations:")
print(f"\tAmplitude (a): {param_std[0]}")
print(f"\tCenter (x0): {param_std[1]}")
print(f"\tSpread (sigma): {param_std[2]}")
This yields the following output:
Fitted parameters:
Amplitude (a): 0.9612152760541642
Center (x0): -0.0222346122268849
Spread (sigma): 1.0894849516367449
Parameter standard deviations:
Amplitude (a): 0.020313055368936066
Center (x0): 0.026585211681826003
Spread (sigma): 0.026585212690588656
To understand how certain we are about the model’s output at a specific input $x_i$, we estimate the variance of the prediction:
[ \begin{aligned} Var \lbrack \hat{y}_i \rbrack &= J_i^T Cov(\vec{\theta}) J_i \end{aligned} ]Here $J_i$ is the gradient (Row of the Jacobian matrix) of the model output with respect to the parameters at $x_i$. The total prediction variance includes model uncertainty and residual noise:
[ \begin{aligned} Var \lbrack y_{pred,i} \rbrack &= \hat{\sigma}^2 + Var \lbrack \hat{y}_i \rbrack \end{aligned} ]from scipy.optimize import approx_fprime
def compute_prediction_variance(xdata, ydata, model, p0=None):
popt, pcov = curve_fit(model, xdata, ydata, p0=p0)
y_fit = model(xdata, *popt)
residuals = ydata - y_fit
sigma2_hat = np.sum(residuals**2) / (len(ydata) - len(popt))
def jacobian_row(xi):
f = lambda p: model(xi, *p)
return approx_fprime(popt, f, epsilon=1e-8)
var_predict = []
for xi in xdata:
J = jacobian_row(xi)
var_model = J @ pcov @ J.T
var_total = sigma2_hat + var_model
var_predict.append(var_total)
return np.array(var_predict), y_fit, popt
# Run the analysis
var_pred, y_fit, popt = compute_prediction_variance(x, y_noise, model)
ci_95 = 1.96 * np.sqrt(var_pred)
# Plot
plt.plot(x, y_noise, 'o', label='Data')
plt.plot(x, y_fit, '-', label='Fit')
plt.fill_between(x, y_fit - ci_95, y_fit + ci_95, alpha=0.3, label='95% Prediction Interval')
plt.legend(); plt.grid(); plt.title("Fitted Gaussian with Prediction Interval")
plt.show()
While the analytical error propagation using the Jacobian and covariance matrix is fast and insightful, it relies on some key assumptions:
If these assumptions are violated - or if one is using a more complex or black-box model - bootstrapping offers a robust alternative. Bootstrapping is a resampling-based method. The basic idea is:
This way one can calculate:
In addition bootstrapping is very simple to implement. For our above examples:
def bootstrap_fit(x, y, model, p0=None, n_boot=1000):
params_boot = []
n = len(x)
for _ in range(n_boot):
indices = np.random.choice(n, n, replace=True)
x_sample = x[indices]
y_sample = y[indices]
try:
popt, _ = curve_fit(model, x_sample, y_sample, p0=p0)
params_boot.append(popt)
except:
continue # skip failed fits
return np.array(params_boot)
# Run bootstrap
params_samples = bootstrap_fit(x, y_noise, model, p0=[1, 0, 1])
param_means = np.mean(params_samples, axis=0)
param_stds = np.std(params_samples, axis=0)
for name, mean, std in zip(['a', 'x0', 'sigma'], param_means, param_stds):
print(f"{name}: {mean:.4f} ± {std:.4f}")
This would yield an output like
a: 0.9613 ± 0.0171
x0: -0.0220 ± 0.0280
sigma: 1.0887 ± 0.0235
As one can see one has generated a set of parameters by taking random samples out of the full dataset and estimating the parameters using the fit via random subsamples. This empirical distribution can then be used to compute statistics such as means, standard deviations, and confidence intervals.
This method doesn’t rely on analytical derivatives and is especially powerful when dealing with complex models or when residual analysis shows non-Gaussian behavior.
Least squares fitting is only the first step in model analysis. To fully understand your results, you must:
These tools allow you to produce confidence bands, prediction intervals, and well-founded error bars, critical for scientific and engineering analysis.
Note: Links to books are Amazon affiliate links, this pages author profits from qualified purchases
This article is tagged: Statistics, School math, Measurements, Python, Tutorial, Basics
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/