.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_auto_examples_plot_basic_analysis.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_basic_analysis.py:


======================
Basic Analysis Example
======================


.. code-block:: python

    from __future__ import absolute_import, print_function, division

    # Authors: Tamás Gál <tgal@km3net.de>, Moritz Lotze <mlotze@km3net.de>
    # License: BSD-3
    # Date: 2017-10-10
    # Status: Under construction...


Preparation
-----------
The very first thing we do is importing our libraries and setting up
the Jupyter Notebook environment.


.. code-block:: python


    import matplotlib.pyplot as plt    # our plotting module
    import pandas as pd    # the main HDF5 reader
    import numpy as np    # must have
    import km3pipe as kp    # some KM3NeT related helper functions
    import seaborn as sns    # beautiful statistical plots!


this is just to make our plots a bit "nicer", you can skip it


.. code-block:: python

    import km3pipe.style
    km3pipe.style.use("km3pipe")


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Loading style definitions from '/home/docs/checkouts/readthedocs.org/user_builds/km3pipe/conda/stable/lib/python3.5/site-packages/km3pipe/kp-data/stylelib/km3pipe.mplstyle'


Accessing the Data File(s)
--------------------------
In the following, we will work with one random simulation file with
reconstruction information from JGandalf which has been converted
from ROOT to HDF5 using the ``tohdf5`` command line tool provided by
``KM3Pipe``.

You can find the documentation here:
https://km3py.pages.km3net.de/km3pipe/cmd.html#tohdf


Note for Lyon Users
~~~~~~~~~~~~~~~~~~~
If you are working on the Lyon cluster, you can activate the latest KM3Pipe
with the following command (put it in your ``~/.bashrc`` to load it
automatically in each shell session)::

    source $KM3NET_THRONG_DIR/src/python/pyenv.sh


Converting from ROOT to HDF5 (if needed)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Choose a file (take e.g. one from /in2p3/km3net/mc/...),
load the appropriate Jpp/Aanet version and convert it via::

    tohdf5 --aa-format=the_file.root --ignore-hits --skip-header

Note that you may have to ``--skip-header`` otherwise you might
encounter some segfaults. There is currently a big mess of different
versions of libraries in several levels of the MC file processings.

The ``--ignore-hits`` will skip the hit information, so the converted file
is much smaller (normally around 2-3 MBytes). Skip this option if you want
to read the hit information too. The file will still be smaller than the
ROOT file (about 1/3).

Luckily, a handful people are preparing the HDF5 conversion, so in future
you can download them directly, without thinking about which Jpp or Aanet
version you need to open them.


First Look at the Data
----------------------


.. code-block:: python


    filepath = "data/basic_analysis_sample.h5"


We can have a quick look at the file with the ``ptdump`` command
in the terminal::

    ptdump filename.h5

For further information, check out the documentation of the KM3NeT HDF5
format definition: http://km3pipe.readthedocs.io/en/latest/hdf5.html


The ``/event_info`` table contains general information about each event.
The data is a simple 2D table and each event is represented by a single row.

Let's have a look at the first few rows:


.. code-block:: python

    event_info = pd.read_hdf(filepath, '/event_info')
    print(event_info.head(5))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    det_id  frame_index  livetime_sec  ...  weight_w3  run_id  event_id
    0      -1            5             0  ...    0.07448       1         0
    1      -1            8             0  ...    0.13710       1         1
    2      -1           13             0  ...    0.11890       1         2
    3      -1           15             0  ...    0.29150       1         3
    4      -1           18             0  ...    0.10220       1         4

    [5 rows x 17 columns]


Next, we will read out the MC tracks which are stored under ``/mc_tracks``.


.. code-block:: python


    tracks = pd.read_hdf(filepath, '/mc_tracks')


also read event info, for things like weights


.. code-block:: python


    info = pd.read_hdf(filepath, '/event_info')


It has a similar structure, but now you can have multiple rows which belong
to an event. The ``event_id`` column holds the ID of the corresponding event.


.. code-block:: python


    print(tracks.head(10))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    bjorkeny     dir_x     dir_y     dir_z  ...   pos_z  time  type  event_id
    0  0.057346 -0.616448 -0.781017 -0.100017  ... -71.802     0   -14         0
    1  0.000000  0.488756 -0.535017 -0.689111  ... -71.802     0    22         0
    2  0.000000 -0.656758 -0.746625 -0.105925  ... -71.802     0   -13         0
    3  0.000000  0.412029 -0.878991 -0.240015  ... -71.802     0  2112         0
    4  0.000000 -0.664951 -0.468928  0.581332  ... -71.802     0  -211         0
    5  0.437484  0.113983  0.914457  0.388298  ...  30.360     0   -14         1
    6  0.000000 -0.345462  0.923065 -0.169138  ...  30.360     0    22         1
    7  0.000000  0.381285  0.828365  0.410406  ...  30.360     0   -13         1
    8  0.000000 -0.191181  0.907296  0.374518  ...  30.360     0  3122         1
    9  0.000000 -0.244006  0.922082  0.300377  ...  30.360     0   321         1

    [10 rows x 15 columns]


We now are accessing the first track for each event by grouping via
``event_id`` and calling the ``first()`` method of the
``Pandas.DataFrame`` object.


.. code-block:: python


    primaries = tracks.groupby('event_id').first()


Here are the first 5 primaries:


.. code-block:: python

    print(primaries.head(5))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    bjorkeny     dir_x     dir_y     dir_z  ...    pos_y   pos_z  time  type
    event_id                                          ...                             
    0         0.057346 -0.616448 -0.781017 -0.100017  ...   67.589 -71.802     0   -14
    1         0.437484  0.113983  0.914457  0.388298  ... -109.844  30.360     0   -14
    2         0.549859 -0.186416 -0.385939 -0.903493  ...  101.459 -30.985     0    14
    3         0.056390 -0.371672  0.550002 -0.747902  ...   15.056  24.474     0   -14
    4         0.049141 -0.124809 -0.979083  0.160683  ...   88.981 -65.848     0    14

    [5 rows x 14 columns]


Creating some Fancy Graphs
--------------------------


.. code-block:: python

    primaries.energy.hist(bins=100, log=True)
    plt.xlabel('energy [GeV]')
    plt.ylabel('number of events')
    plt.title('Energy Distribution')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_001.png
    :class: sphx-glr-single-img


.. code-block:: python

    primaries.bjorkeny.hist(bins=100)
    plt.xlabel('bjorken-y')
    plt.ylabel('number of events')
    plt.title('bjorken-y Distribution')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_002.png
    :class: sphx-glr-single-img


.. code-block:: python

    zeniths = kp.math.zenith(primaries.filter(regex='^dir_.?$'))
    primaries['zenith'] = zeniths

    plt.hist(np.cos(primaries.zenith), bins=21, histtype='step', linewidth=2)
    plt.xlabel(r'cos($\theta$)')
    plt.ylabel('number of events')
    plt.title('Zenith Distribution')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_003.png
    :class: sphx-glr-single-img


Starting positions of primaries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. code-block:: python

    plt.hist2d(primaries.pos_x, primaries.pos_y, bins=100, cmap='viridis')
    plt.xlabel('x [m]')
    plt.ylabel('y [m]')
    plt.title('2D Plane')
    plt.colorbar()


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_004.png
    :class: sphx-glr-single-img


If you have seaborn installed (`pip install seaborn`), you can easily create
nice jointplots:


.. code-block:: python

    try:
        import seaborn as sns    # noqa
        km3pipe.style.use("km3pipe")    # reset matplotlib style
    except:
        print("No seaborn found, skipping example.")
    else:
        g = sns.jointplot('pos_x', 'pos_y', data=primaries, kind='hex')
        g.set_axis_labels("x [m]", "y[m]")
        plt.subplots_adjust(right=0.90)    # make room for the colorbar
        plt.title("2D Plane")
        plt.colorbar()
        plt.legend()


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_005.png
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Loading style definitions from '/home/docs/checkouts/readthedocs.org/user_builds/km3pipe/conda/stable/lib/python3.5/site-packages/km3pipe/kp-data/stylelib/km3pipe.mplstyle'


.. code-block:: python

    from mpl_toolkits.mplot3d import Axes3D    # noqa
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter3D(primaries.pos_x, primaries.pos_y, primaries.pos_z, s=3)
    ax.set_xlabel('x [m]', labelpad=10)
    ax.set_ylabel('y [m]', labelpad=10)
    ax.set_zlabel('z [m]', labelpad=10)
    ax.set_title('3D Plane')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_006.png
    :class: sphx-glr-single-img


.. code-block:: python

    gandalfs = pd.read_hdf(filepath, '/reco/gandalf')
    print(gandalfs.head(5))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    beta0     beta1        chi2  ...  type  upgoing_vs_downgoing  event_id
    0  0.016788  0.011857  -53.119816  ...   0.0             -0.274836         0
    1  0.007835  0.005533  -32.504874  ...   0.0              3.907941         1
    2  0.012057  0.008456  -81.195134  ...   0.0             -0.385038         2
    3  0.007858  0.005554 -200.985734  ...   0.0             -0.809872         3
    4  0.011166  0.007366  -89.451264  ...   0.0             -0.167897         4

    [5 rows x 83 columns]


.. code-block:: python

    gandalfs.columns


.. code-block:: python

    plt.hist(gandalfs['lambda'], bins=50, log=True)
    plt.xlabel('lambda parameter')
    plt.ylabel('count')
    plt.title('Lambda Distribution of Reconstructed Events')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_007.png
    :class: sphx-glr-single-img


.. code-block:: python

    gandalfs['zenith'] = kp.math.zenith(gandalfs.filter(regex='^dir_.?$'))

    plt.hist((gandalfs.zenith - primaries.zenith).dropna(), bins=100)
    plt.xlabel(r'true zenith - reconstructed zenith [rad]')
    plt.ylabel('count')
    plt.title('Zenith Reconstruction Difference')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_008.png
    :class: sphx-glr-single-img


.. code-block:: python

    l = 0.2
    lambda_cut = gandalfs['lambda'] < l
    plt.hist((gandalfs.zenith - primaries.zenith)[lambda_cut].dropna(), bins=100)
    plt.xlabel(r'true zenith - reconstructed zenith [rad]')
    plt.ylabel('count')
    plt.title('Zenith Reconstruction Difference for lambda < {}'.format(l))


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_009.png
    :class: sphx-glr-single-img


Combined zenith reco plot for different lambda cuts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. code-block:: python


    fig, ax = plt.subplots()
    for l in [100, 5, 2, 1, 0.1]:
        l_cut = gandalfs['lambda'] < l
        ax.hist((primaries.zenith - gandalfs.zenith)[l_cut].dropna(),
                bins=100,
                label=r"$\lambda$ = {}".format(l),
                alpha=.7)
    plt.xlabel(r'true zenith - reconstructed zenith [rad]')
    plt.ylabel('count')
    plt.legend()
    plt.title('Zenith Reconstruction Difference for some Lambda Cuts')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_010.png
    :class: sphx-glr-single-img


Fitting Angular resolutions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's fit some distributions: gaussian + lorentz (aka norm + cauchy)

Fitting the gaussian to the whole range is a very bad fit, so
we make a second gaussian fit only to +- 10 degree.
Conversely, the Cauchy (lorentz) distribution is a near perfect fit
(note that ``2 gamma = FWHM``).


.. code-block:: python


    from scipy.stats import cauchy, norm    # noqa

    residuals = gandalfs.zenith - primaries.zenith
    cut = (gandalfs['lambda'] < l) & (np.abs(residuals) < 2 * np.pi)
    residuals = residuals[cut]
    info[cut]

    # convert rad -> deg
    residuals = residuals * 180 / np.pi

    pi = 180
    # x axis for plotting
    x = np.linspace(-pi, pi, 1000)

    c_loc, c_gamma = cauchy.fit(residuals)
    fwhm = 2 * c_gamma

    g_mu_bad, g_sigma_bad = norm.fit(residuals)
    g_mu, g_sigma = norm.fit(residuals[np.abs(residuals) < 10])

    plt.hist(residuals, bins='auto', label='Histogram', normed=True, alpha=.7)
    plt.plot(
        x,
        cauchy(c_loc, c_gamma).pdf(x),
        label='Lorentz: FWHM $=${:.3f}'.format(fwhm),
        linewidth=2
    )
    plt.plot(
        x,
        norm(g_mu_bad, g_sigma_bad).pdf(x),
        label='Unrestricted Gauss: $\sigma =$ {:.3f}'.format(g_sigma_bad),
        linewidth=2
    )
    plt.plot(
        x,
        norm(g_mu, g_sigma).pdf(x),
        label='+- 10 deg Gauss: $\sigma =$ {:.3f}'.format(g_sigma),
        linewidth=2
    )
    plt.xlim(-pi / 4, pi / 4)
    plt.xlabel('Zenith residuals / deg')
    plt.legend()


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_011.png
    :class: sphx-glr-single-img


We can also look at the median resolution without doing any fits.

In textbooks, this metric is also called Median Absolute Deviation.


.. code-block:: python


    resid_median = np.median(residuals)
    residuals_shifted_by_median = residuals - resid_median
    absolute_deviation = np.abs(residuals_shifted_by_median)
    resid_mad = np.median(absolute_deviation)

    plt.hist(np.abs(residuals), alpha=.7, bins='auto', label='Absolute residuals')
    plt.axvline(resid_mad, label='MAD: {:.2f}'.format(resid_mad), linewidth=3)
    plt.title("Average resolution: {:.3f} degree".format(resid_mad))
    plt.legend()
    plt.xlabel('Absolute zenith residuals / deg')


.. image:: /auto_examples/images/sphx_glr_plot_basic_analysis_012.png
    :class: sphx-glr-single-img


**Total running time of the script:** ( 0 minutes  16.695 seconds)


.. _sphx_glr_download_auto_examples_plot_basic_analysis.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_basic_analysis.py <plot_basic_analysis.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_basic_analysis.ipynb <plot_basic_analysis.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_