Storing Data

In science, people want to store results, either intermediate ones to have a “checkpoint” in a potentially huge analysis chain or of course final results which are only used for high level analysis, data visualisation and interpretation. HDF5 is a dataformat which is open source, popular among (data) scientists and flexible enough to store all kinds of data.

The Pipeline, Table and HDF5Pump/HDF5Sink classes are very good friends. In this document I’ll demonstrate how to build a pipeline to analyse a file, store intermediate results using the Table and HDF5Sink classes and then do some basic high level data analysis using the Pandas (https://pandas.pydata.org) framework.

Work in progress…