.. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_stats_plot_serial_killer.py: Serial Killings -- Powerlaw or not? =================================== Try to reproduce the analysis in `Cosma Shalizi's blogpost `__ with scipy. First, import things .. code-block:: python from __future__ import absolute_import, print_function, division import numpy as np import pandas as pd from scipy import stats from statsmodels.distributions.empirical_distribution import ECDF import matplotlib.pyplot as plt from km3pipe.stats import bootstrap_fit import km3pipe.style.moritz # noqa The data (times of killings) are in a text file, let's write a small function to read them. We are interested in the time differences between the murders. .. code-block:: python def get_time_deltas(fname): with open(fname) as f: raw_dates = f.read().split('\n') murder_dates = pd.Series(pd.to_datetime(raw_dates), name='Date') day = np.timedelta64(1, 'D') diffs = np.array(murder_dates.diff().iloc[1:-1]) / day diffs = pd.Series(diffs) return diffs DATES_FILE = '../data/murder_dates.txt' diffs = get_time_deltas(DATES_FILE) .. code-block:: python diffs.hist(bins='auto') .. image:: /auto_examples/stats/images/sphx_glr_plot_serial_killer_001.png :class: sphx-glr-single-img .. code-block:: python ecdf = ECDF(diffs) days, eucdf = ecdf.x, 1 - ecdf.y plt.loglog(days, eucdf) plt.xlabel('Days between murders') plt.ylabel('Cumulative probability') .. image:: /auto_examples/stats/images/sphx_glr_plot_serial_killer_002.png :class: sphx-glr-single-img Let's fit a powerlaw .. code-block:: python pareto_idx, pareto_loc, pareto_scale = stats.pareto.fit(diffs) pareto = stats.pareto(pareto_idx, pareto_loc, pareto_scale) _ = bootstrap_fit(stats.pareto, diffs, n_iter=100) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none -------------- pareto -------------- loc: +0.454 ∈ [+0.498, +0.498] (95%) scale: -1.454 ∈ [-0.002, -0.002] (95%) b: +4.514 ∈ [+3.112, +3.112] (95%) And a lognormal, because Gauss is not mocked. .. code-block:: python lognorm_sig, lognorm_shape, lognorm_scale = stats.lognorm.fit(diffs) lognorm = stats.lognorm(lognorm_sig, lognorm_shape, lognorm_scale) _ = bootstrap_fit(stats.lognorm, diffs, n_iter=100) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none -------------- lognorm -------------- loc: +2.176 ∈ [+6.613, +6.613] (95%) scale: +2.363 ∈ [+3.000, +3.000] (95%) s: +23.764 ∈ [+35.268, +35.268] (95%) .. code-block:: python plt.loglog( days, 1 - pareto.cdf(days), label='Pareto Fit (exponent {:.3})'.format(pareto_idx + 1) ) plt.loglog(days, 1 - lognorm.cdf(days), label='LogNorm Fit') plt.loglog(days, eucdf, label='Empirical CDF') plt.xlabel('Days between murders') plt.ylabel('Cumulative probability') plt.legend() .. image:: /auto_examples/stats/images/sphx_glr_plot_serial_killer_003.png :class: sphx-glr-single-img **Total running time of the script:** ( 0 minutes 18.279 seconds) .. _sphx_glr_download_auto_examples_stats_plot_serial_killer.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download :download:`Download Python source code: plot_serial_killer.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: plot_serial_killer.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_