A mini recipe to fit arbitrary models to data using lmfit in Python

11 Jan 2024 - tsp
Last update 11 Jan 2024
Reading time 6 mins

Introduction

Fitting models to data is a task that has to be done by a physicists on a daily basis. lmfit is a very nice and easy to use library when doing this in Python. Upfront - this is not the original way how one’s supposed to build models using lmfit. lmfit is supposed to be a very easy wrapper around scipy fitting routines for different fitting methods (by default it uses the Levenberg-Marquardt least squares fitting method - depending on the set of parameters and type of functions one can use any of the many other fitting methods supported by lmfit and scipy) as well as a method to easily build composite functions based on existing model functions. This recipe on the other hand just uses the standard minimizer interface as well as an guessing and objective function. It’s very similar to the method shown in the minimize documentation but differs from the approach usually used in the documentation.

How to install lmfit
A very simple method for a simple model
Why this method might be worth using
GitHub GIST

How to install `lmfit`

The easiest way to install lmfit is using FreeBSDs pkg to fetch precompiled binaries:

pkg install math/lmfit

Another way to install is using pip:

pip install lmfit

A very simple method for a simple model

The idea is simple: You have some data and a model function that depends on a set of parameters. You then want to decide on the best choice of parameters so your model function matches the data in the best possible way. This is the most common problem that I usually use in many practical problems.

Upfront let’s import all required functions and classes:

import numpy as np
import matplotlib.pyplot as plt

from lmfit import Parameters, fit_report, minimize

The first thing we need is a objective function that we want to fit to our data. This can be a polynomial, a Gaussian function, etc. This function can be either evaluated - then it will get passed the coordinate at which is should be evaluated and the parameters - or it should be fitted in which case it will also receive the data. Let’s take a look how such a simple function can look for a Gaussian:

[ f(x; c, \mu, \sigma) = c * \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \frac{(x-\mu)^2}{\sigma^2}} ]

As one can see this function has three parameters:

The amplitude $c$
The most probable value $\mu$
The standard deviation $\sigma$

We are going to implement this method in a way that it receives the parameters as an Parameter object in it’s first argument, the evaluation point as it’s second argument and optionally a third argument that receives data during fitting:

def modelFunction(pars, x, data = None):
    # First we convert the Parameters object into a dictionary,
    # this will make our code a little bit easier to understand

    vals = pars.valuesdict()

    # For sake of readability give our parameters variables.

    amp    = vals["c"]
    mu     = vals["mu"]
    sigma  = vals["sigma"]
    offset = vals["offset"]

    # Evaluate the function

    modelValue = amp * (1.0 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu)**2 / (sigma**2))) + offset

    # If we got data calculate the residual, else return the value
    if data is not None:
        return modelValue - data
    else:
        return modelValue

Now let’s assume we got a bunch of data points at the position data_x with values data_y and we want to fit this Gaussian function. For sake of this demonstration we’re going to generate a bunch of noisy data points by our self:

real_mu = 3.4
real_sigma = 3.1
real_amp = 7.1
real_offset = -3
real_noise_lvl = 1

data_x = np.linspace(-10, 10, 100)
data_y = real_amp * np.exp(-0.5 * (data_x - real_mu)**2 / (real_sigma**2)) / (np.sqrt(2 * np.pi) * real_sigma) + real_offset
data_y_noisy = data_y + (np.random.rand(data_y.shape[0]) - 0.5) * real_noise_lvl

Lets take a short look how this generated data looks like:

fig, ax = plt.subplots()
ax.set_title("Simulated input data")
ax.plot(data_x, data_y, 'g--', label = "Real data")
ax.plot(data_x, data_y_noisy, label = "Noisy input data")
ax.grid()
ax.legend()
plt.show()

Now lets try to fit our function on the noisy data. First we generate a Parameters object. For the initial values we do a naive guess of the parameters:

The amplitude will be guessed to be the maximum minus the minimum - so the span of the function values
The most probable value $\mu$ will be estimated to be at the location of the maximum
We assume $\sigma$ to be 1.
We assume the offset to be the minimum of all data points

fit_pars = Parameters()
fit_pars.add("amp", value = np.max(data_y_noisy) - np.min(data_y_noisy), vary = True)
fit_pars.add("mu", value = data_x[np.argmax(data_y_noisy)], vary = True)
fit_pars.add("sigma", value = 1, vary = True)
fit_pars.add("offset", value = np.min(data_y_noisy), vary = True)

After initialization we just have to call minimize:

out = minimize(modelFunction, fit_pars, args=(data_x,), kws = { 'data' : data_y_noisy })

This has already performed the fit. To get a human readable fit report we can just use fit_report:

print(fit_report(out))

This will yield output like the following:

[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 107
    # data points      = 100
    # variables        = 4
    chi-square         = 9.70078655
    reduced chi-square = 0.10104986
    Akaike info crit   = -225.296322
    Bayesian info crit = -214.875641
[[Variables]]
    amp:     9.66997823 +/- 1.66650837 (17.23%) (init = -1.615516)
    mu:      3.53609503 +/- 0.32519657 (9.20%) (init = 2.929293)
    sigma:   3.95359254 +/- 0.50650314 (12.81%) (init = 1)
    offset: -3.09039105 +/- 0.07705836 (2.49%) (init = -3.490874)
[[Correlations]] (unreported correlations are < 0.100)
    C(amp, offset)   = -0.906
    C(amp, sigma)    = 0.844
    C(sigma, offset) = -0.723

One can programmatically access $\chi^2$ by out.chisqr (in this example 9.70078654587556)

print(f"Chi squared: {out.chisqr}")

Now let’s plot the result again:

fig, ax = plt.subplots()
ax.plot(data_x, data_y_noisy, 'x', label = "Noisy input data")
ax.plot(data_x, modelFunction(out.params, data_x), 'r--', label = "Fitted function")
ax.plot(data_x, data_y, 'g', label = "Original function")
ax.grid()
ax.legend()
plt.show()

Why this method might be worth using

So this method is of course much boilerplate code for lmfit that would also support simple model building like in the following example:

from lmfit.models import GaussianModel, ConstantModel

model = GaussianModel()
params = model.guess(data_y_noisy, data_x)

background = ConstantModel()
background.guess(data_y_noisy, data_x)

model = model + background

out = model.fit(data_y_noisy, params, x=data_x)

## Plot

fig, ax = plt.subplots()
ax.plot(data_x, data_y_noisy, 'x', label = "Original data")
ax.plot(data_x, out.best_fit, 'r', label = "Fitted function")
plt.show()

It gets really handy when one wants to build complex models or do iterative approximations that are hard to handle using other minimization or fitting systems though or when one wants to perform experimental measurements in between the fitting steps.

GitHub GIST

The whole code used for this blog article can be found in a GitHub GIST