PyBroom Example - Simple¶

This notebook is part of pybroom.

This notebook shows the simplest usage of pybroom when performing a curve fit of a single dataset. Possible applications are only hinted. For a more complex (and interesting!) example using multiple datasets see pybroom-example-multi-datasets.

In [1]:

import numpy as np
from numpy import sqrt, pi, exp, linspace
from lmfit import Model
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'  # for hi-dpi displays

In [2]:

import lmfit
print('lmfit: %s' % lmfit.__version__)

lmfit: 0.9.5

In [3]:

import pybroom as br

Create Noisy Data¶

In [4]:

x = np.linspace(-10, 10, 101)

In [5]:

peak1 = lmfit.models.GaussianModel(prefix='p1_')
peak2 = lmfit.models.GaussianModel(prefix='p2_')
model = peak1 + peak2

In [6]:

params = model.make_params(p1_amplitude=1, p2_amplitude=1,
                           p1_sigma=1, p2_sigma=1)

In [7]:

y_data = model.eval(x=x, p1_center=-1, p2_center=2, p1_sigma=0.5, p2_sigma=1, p1_amplitude=1, p2_amplitude=2)
y_data.shape

Out[7]:

(101,)

In [8]:

y_data += np.random.randn(*y_data.shape)/10

In [9]:

plt.plot(x, y_data)

Out[9]:

[<matplotlib.lines.Line2D at 0x7fa2c866ce48>]

../_images/notebooks_pybroom-example_11_1.png

Model Fitting¶

In [10]:

params = model.make_params(p1_center=0, p2_center=3,
                           p1_sigma=0.5, p2_sigma=1,
                           p1_amplitude=1, p2_amplitude=2)
result = model.fit(y_data, x=x, params=params)

Fit result from an lmfit Model can be inspected with with fit_report or params.pretty_print():

In [11]:

print(result.fit_report())

[[Model]]
    (Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_'))
[[Fit Statistics]]
    # function evals   = 82
    # data points      = 101
    # variables        = 6
    chi-square         = 1.259
    reduced chi-square = 0.013
    Akaike info crit   = -430.903
    Bayesian info crit = -415.212
[[Variables]]
    p1_center:     -0.99098079 +/- 0.057195 (5.77%) (init= 0)
    p1_amplitude:   1.01672511 +/- 0.094197 (9.26%) (init= 1)
    p1_sigma:       0.54812438 +/- 0.058352 (10.65%) (init= 0.5)
    p2_center:      1.99780286 +/- 0.064604 (3.23%) (init= 3)
    p2_amplitude:   2.02456302 +/- 0.121975 (6.02%) (init= 2)
    p2_sigma:       0.94915124 +/- 0.068652 (7.23%) (init= 1)
    p1_fwhm:        1.29073427 +/- 0.137409 (10.65%)  == '2.3548200*p1_sigma'
    p1_height:      0.74000475 +/- 0.064033 (8.65%)  == '0.3989423*p1_amplitude/max(1.e-15, p1_sigma)'
    p2_fwhm:        2.23508032 +/- 0.161664 (7.23%)  == '2.3548200*p2_sigma'
    p2_height:      0.85095377 +/- 0.049431 (5.81%)  == '0.3989423*p2_amplitude/max(1.e-15, p2_sigma)'
[[Correlations]] (unreported correlations are <  0.100)
    C(p1_amplitude, p1_sigma)    =  0.630
    C(p2_amplitude, p2_sigma)    =  0.630
    C(p1_amplitude, p2_sigma)    = -0.321
    C(p1_sigma, p2_sigma)        = -0.299
    C(p1_center, p2_sigma)       = -0.261
    C(p1_amplitude, p2_amplitude)  = -0.246
    C(p1_sigma, p2_amplitude)    = -0.240
    C(p1_center, p2_amplitude)   = -0.199
    C(p1_sigma, p2_center)       =  0.196
    C(p1_amplitude, p2_center)   =  0.187
    C(p1_center, p2_center)      =  0.150
    C(p1_center, p1_amplitude)   =  0.105
    C(p1_center, p1_sigma)       =  0.102
    C(p2_center, p2_sigma)       = -0.101

In [12]:

result.params.pretty_print()

Name             Value      Min      Max   Stderr     Vary     Expr
p1_amplitude     1.017     -inf      inf   0.0942     True     None
p1_center       -0.991     -inf      inf   0.0572     True     None
p1_fwhm          1.291     -inf      inf   0.1374    False 2.3548200*p1_sigma
p1_height         0.74     -inf      inf  0.06403    False 0.3989423*p1_amplitude/max(1.e-15, p1_sigma)
p1_sigma        0.5481        0      inf  0.05835     True     None
p2_amplitude     2.025     -inf      inf    0.122     True     None
p2_center        1.998     -inf      inf   0.0646     True     None
p2_fwhm          2.235     -inf      inf   0.1617    False 2.3548200*p2_sigma
p2_height        0.851     -inf      inf  0.04943    False 0.3989423*p2_amplitude/max(1.e-15, p2_sigma)
p2_sigma        0.9492        0      inf  0.06865     True     None

These methods a re convenient but extracting the data from the lmfit object requires some work and the knowledge of lmfit object structure.

pybroom comes to help, extracting data from fit results and returning pandas DataFrame in tidy format that can be much more easily manipulated, filtered and plotted.

Glance¶

Glancing at the fit results (dropping some verbose columns):

In [13]:

dg = br.glance(result)
dg.drop('model', 1).drop('message', 1)

Out[13]:

	method	num_params	num_data_points	chisqr	redchi	AIC	BIC	num_func_eval	success
0	leastsq	6	101	1.258531	0.013248	-430.902718	-415.211995	82	True

The glance function returns a DataFrame with one row per fit-result object.

Application Idea¶

If you fit N models to the same dataset you can compare statistics such as reduced-\(\chi^2\)

Or, fitting several with several methods (and datasets) you can study the convergence properties using reduced-\(\chi^2\), number of function evaluation and success rate.

Tidy¶

Tidy fit results for all the parameters:

In [14]:

dt = br.tidy(result)
dt

Out[14]:

	name	value	min	max	vary	expr	stderr	init_value
0	p1_amplitude	1.016725	-inf	inf	True	None	0.094197	1.0
1	p1_center	-0.990981	-inf	inf	True	None	0.057196	0.0
2	p1_fwhm	1.290734	-inf	inf	False	2.3548200*p1_sigma	0.137410	NaN
3	p1_height	0.740005	-inf	inf	False	0.3989423*p1_amplitude/max(1.e-15, p1_sigma)	0.064033	NaN
4	p1_sigma	0.548124	0.000000	inf	True	None	0.058353	0.5
5	p2_amplitude	2.024563	-inf	inf	True	None	0.121975	2.0
6	p2_center	1.997803	-inf	inf	True	None	0.064604	3.0
7	p2_fwhm	2.235080	-inf	inf	False	2.3548200*p2_sigma	0.161665	NaN
8	p2_height	0.850954	-inf	inf	False	0.3989423*p2_amplitude/max(1.e-15, p2_sigma)	0.049431	NaN
9	p2_sigma	0.949151	0.000000	inf	True	None	0.068653	1.0

The tidy function returns one row for each parameter.

In [15]:

dt.loc[dt.name == 'p1_center']

Out[15]:

	name	value	min	max	vary	expr	stderr	init_value
1	p1_center	-0.990981	-inf	inf	True	None	0.057196	0.0

Augment¶

Tidy dataframe with data function of the independent variable (‘x’). Columns include the data being fitted, best fit, best fit components, residuals, etc.

In [16]:

da = br.augment(result)
da.head()

Out[16]:

	x	data	best_fit	residual	Model(gaussian, prefix='p1_')	Model(gaussian, prefix='p2_')
0	-10.0	-0.213844	1.711221e-35	0.213844	1.614414e-59	1.711221e-35
1	-9.8	0.063797	2.401207e-34	-0.063797	6.076585e-57	2.401207e-34
2	-9.6	-0.122706	3.223075e-33	0.122706	2.002088e-54	3.223075e-33
3	-9.4	-0.079427	4.138359e-32	0.079427	5.774119e-52	4.138359e-32
4	-9.2	-0.032691	5.082799e-31	0.032691	1.457697e-49	5.082799e-31

The `augment <>`__ function returns one row for each data point.

In [17]:

d = br.augment(result)

In [18]:

fig, ax = plt.subplots(2, 1, figsize=(7, 8))
ax[1].plot('x', 'data', data=d, marker='o', ls='None')
ax[1].plot('x', "Model(gaussian, prefix='p1_')", data=d, lw=2, ls='--')
ax[1].plot('x', "Model(gaussian, prefix='p2_')", data=d, lw=2, ls='--')
ax[1].plot('x', 'best_fit', data=d, lw=2)
ax[0].plot('x', 'residual', data=d);

../_images/notebooks_pybroom-example_30_0.png

Application Idea¶

Fitting N datasets with the same model (or N models with the same dataset) you can automatically build a panel plot with seaborn using the dataset (or the model) as categorical variable. This example is illustrated in pybroom-example-multi-datasets.