PyBroom Example - Simple¶

This notebook is part of pybroom.

This notebook shows the simplest usage of pybroom when performing a curve fit of a single dataset. Possible applications are only hinted. For a more complex (and interesting!) example using multiple datasets see pybroom-example-multi-datasets.

In [1]:

import numpy as np
from numpy import sqrt, pi, exp, linspace
from lmfit import Model
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'  # for hi-dpi displays

In [2]:

import lmfit
print('lmfit: %s' % lmfit.__version__)

lmfit: 0.9.7

In [3]:

import pybroom as br

Create Noisy Data¶

In [4]:

x = np.linspace(-10, 10, 101)

In [5]:

peak1 = lmfit.models.GaussianModel(prefix='p1_')
peak2 = lmfit.models.GaussianModel(prefix='p2_')
model = peak1 + peak2

In [6]:

params = model.make_params(p1_amplitude=1, p2_amplitude=1,
                           p1_sigma=1, p2_sigma=1)

In [7]:

y_data = model.eval(x=x, p1_center=-1, p2_center=2, p1_sigma=0.5, p2_sigma=1, p1_amplitude=1, p2_amplitude=2)
y_data.shape

Out[7]:

(101,)

In [8]:

y_data += np.random.randn(*y_data.shape)/10

In [9]:

plt.plot(x, y_data)

Out[9]:

[<matplotlib.lines.Line2D at 0x7f0a072b36d8>]

../_images/notebooks_pybroom-example_11_1.png

Model Fitting¶

In [10]:

params = model.make_params(p1_center=0, p2_center=3,
                           p1_sigma=0.5, p2_sigma=1,
                           p1_amplitude=1, p2_amplitude=2)
result = model.fit(y_data, x=x, params=params)

Fit result from an lmfit Model can be inspected with with fit_report or params.pretty_print():

In [11]:

print(result.fit_report())

[[Model]]
    (Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_'))
[[Fit Statistics]]
    # function evals   = 75
    # data points      = 101
    # variables        = 6
    chi-square         = 0.885
    reduced chi-square = 0.009
    Akaike info crit   = -466.467
    Bayesian info crit = -450.776
[[Variables]]
    p1_amplitude:   0.99193692 +/- 0.073905 (7.45%) (init= 1)
    p1_sigma:       0.50403814 +/- 0.043298 (8.59%) (init= 0.5)
    p1_center:     -0.93732080 +/- 0.042639 (4.55%) (init= 0)
    p2_amplitude:   1.80858848 +/- 0.098970 (5.47%) (init= 2)
    p2_sigma:       0.92432008 +/- 0.060121 (6.50%) (init= 1)
    p2_center:      2.04760247 +/- 0.057497 (2.81%) (init= 3)
    p1_fwhm:        1.18691910 +/- 0.101960 (8.59%)  == '2.3548200*p1_sigma'
    p1_height:      0.78511041 +/- 0.055989 (7.13%)  == '0.3989423*p1_amplitude/max(1.e-15, p1_sigma)'
    p2_fwhm:        2.17660743 +/- 0.141575 (6.50%)  == '2.3548200*p2_sigma'
    p2_height:      0.78059804 +/- 0.041815 (5.36%)  == '0.3989423*p2_amplitude/max(1.e-15, p2_sigma)'
[[Correlations]] (unreported correlations are <  0.100)
    C(p1_amplitude, p1_sigma)    =  0.613
    C(p2_amplitude, p2_sigma)    =  0.612
    C(p1_amplitude, p2_sigma)    = -0.263
    C(p1_sigma, p2_sigma)        = -0.251
    C(p1_center, p2_sigma)       = -0.210
    C(p1_amplitude, p2_amplitude)  = -0.194
    C(p1_sigma, p2_amplitude)    = -0.192
    C(p1_center, p2_amplitude)   = -0.154
    C(p1_sigma, p2_center)       =  0.148
    C(p1_amplitude, p2_center)   =  0.139
    C(p1_center, p2_center)      =  0.110

In [12]:

result.params.pretty_print()

Name             Value      Min      Max   Stderr     Vary     Expr Brute_Step
p1_amplitude    0.9919     -inf      inf  0.07391     True     None     None
p1_center      -0.9373     -inf      inf  0.04264     True     None     None
p1_fwhm          1.187     -inf      inf    0.102    False 2.3548200*p1_sigma     None
p1_height       0.7851     -inf      inf  0.05599    False 0.3989423*p1_amplitude/max(1.e-15, p1_sigma)     None
p1_sigma         0.504        0      inf   0.0433     True     None     None
p2_amplitude     1.809     -inf      inf  0.09897     True     None     None
p2_center        2.048     -inf      inf   0.0575     True     None     None
p2_fwhm          2.177     -inf      inf   0.1416    False 2.3548200*p2_sigma     None
p2_height       0.7806     -inf      inf  0.04182    False 0.3989423*p2_amplitude/max(1.e-15, p2_sigma)     None
p2_sigma        0.9243        0      inf  0.06012     True     None     None

These methods a re convenient but extracting the data from the lmfit object requires some work and the knowledge of lmfit object structure.

pybroom comes to help, extracting data from fit results and returning pandas DataFrame in tidy format that can be much more easily manipulated, filtered and plotted.

Glance¶

Glancing at the fit results (dropping some verbose columns):

In [13]:

dg = br.glance(result)
dg.drop('model', 1).drop('message', 1)

Out[13]:

	method	num_params	num_data_points	chisqr	redchi	AIC	BIC	num_func_eval	success
0	leastsq	6	101	0.884993	0.009316	-466.466867	-450.776144	75	True

The glance function returns a DataFrame with one row per fit-result object.

Application Idea¶

If you fit N models to the same dataset you can compare statistics such as reduced-\(\chi^2\)

Or, fitting several with several methods (and datasets) you can study the convergence properties using reduced-\(\chi^2\), number of function evaluation and success rate.

Tidy¶

Tidy fit results for all the parameters:

In [14]:

dt = br.tidy(result)
dt

Out[14]:

	name	value	min	max	vary	expr	stderr	init_value
0	p1_amplitude	0.991937	-inf	inf	True	None	0.073906	1.0
1	p1_center	-0.937321	-inf	inf	True	None	0.042640	0.0
2	p1_fwhm	1.186919	-inf	inf	False	2.3548200*p1_sigma	0.101961	NaN
3	p1_height	0.785110	-inf	inf	False	0.3989423*p1_amplitude/max(1.e-15, p1_sigma)	0.055990	NaN
4	p1_sigma	0.504038	0.000000	inf	True	None	0.043299	0.5
5	p2_amplitude	1.808588	-inf	inf	True	None	0.098970	2.0
6	p2_center	2.047602	-inf	inf	True	None	0.057498	3.0
7	p2_fwhm	2.176607	-inf	inf	False	2.3548200*p2_sigma	0.141575	NaN
8	p2_height	0.780598	-inf	inf	False	0.3989423*p2_amplitude/max(1.e-15, p2_sigma)	0.041815	NaN
9	p2_sigma	0.924320	0.000000	inf	True	None	0.060121	1.0

The tidy function returns one row for each parameter.

In [15]:

dt.loc[dt.name == 'p1_center']

Out[15]:

	name	value	min	max	vary	expr	stderr	init_value
1	p1_center	-0.937321	-inf	inf	True	None	0.04264	0.0

Augment¶

Tidy dataframe with data function of the independent variable (‘x’). Columns include the data being fitted, best fit, best fit components, residuals, etc.

In [16]:

da = br.augment(result)
da.head()

Out[16]:

	x	data	best_fit	residual	Model(gaussian, prefix='p1_')	Model(gaussian, prefix='p2_')
0	-10.0	0.022939	1.005185e-37	-0.022939	4.947068e-71	1.005185e-37
1	-9.8	-0.036681	1.647759e-36	0.036681	5.735959e-68	1.647759e-36
2	-9.6	0.071802	2.577558e-35	-0.071802	5.681800e-65	2.577558e-35
3	-9.4	0.165100	3.847602e-34	-0.165100	4.808258e-62	3.847602e-34
4	-9.2	-0.021899	5.480738e-33	0.021899	3.476253e-59	5.480738e-33

The `augment <>`__ function returns one row for each data point.

In [17]:

d = br.augment(result)

In [18]:

fig, ax = plt.subplots(2, 1, figsize=(7, 8))
ax[1].plot('x', 'data', data=d, marker='o', ls='None')
ax[1].plot('x', "Model(gaussian, prefix='p1_')", data=d, lw=2, ls='--')
ax[1].plot('x', "Model(gaussian, prefix='p2_')", data=d, lw=2, ls='--')
ax[1].plot('x', 'best_fit', data=d, lw=2)
ax[0].plot('x', 'residual', data=d);

../_images/notebooks_pybroom-example_30_0.png

Application Idea¶

Fitting N datasets with the same model (or N models with the same dataset) you can automatically build a panel plot with seaborn using the dataset (or the model) as categorical variable. This example is illustrated in pybroom-example-multi-datasets.