Data Structures

This section describes the basic data structures which a pump usually provides via the blob dictionary. The pump is responsible to parse the data and create a blob (a simple Python dictionary) for each event in the file. When processing a data file with KM3Pipe, a module chain is being utilised to cycle through the events. Each module within the chain recieves the original, unaltered data from the pump and further also additional information created by the preceeding modules.

The class used in KM3Pipe to represent almost any kinds of datastructures which are written to or read from the disk or network connections is called Table. It is a two-dimensional numpy ndarray (np.recarray subclass), where each “column” (1D array) is accessible using its specific attribute.

The Table is designed to work together with the HDF5Sink, which dumps the table data into a given location (Table.h5loc attribute) in an HDF5 file.

Tables written using the HDF5Sink can be read bye the HDF5Pump, which will retrieve them in pieces (groups), just like they were written during a pipeline. However, the HDF5 files containing Table data can also be read by any other package since it uses only native HDF5 data structures (HDF5Compund, for the experts).

Hits and McHits

If you want to analyse the hits or create your own reconstruction, the Hits and McHits datatypes are the most important ones.

The Hits come with dom_id, channel_id, tot, time and triggered and the McHits have a, origin, time and pmt_id. Additional information about hit positions and directions etc is available in the CalibHits datastructure.

All Table-backed datastructures (hits, tracks, …) have attributes which can be accessed through the following getters:

information getter type Hit McHit CalibHit CalibMcHit
hit time .time float32 X X X X
time over threshold .tot uint8 X   X  
a (number of p.e.) .a float32   X   X
PMT ID .pmt_id uint32   X   X
Channel ID .channel_id uint8 X   X  
DOM ID .dom_id uint32 X   X  
trigger information .triggered bool X   X  
origin (track ID) .origin uint32   X   X
position .pos_[xzy] float32     X X
direction .dir_[xzy] float32     X X
t0 .t0 float32     X  
du .du uint8     X X
floor .floor uint8     X X

Note that if you access .tot of a Hits table for example, you will get a 1D numpy array containing all the ToTs of the hits (in the order of the hits). So you can for example quickly have a look at the ToT distribution of the full event.

Calibrating Hits and McHits

Both Hits and McHits have corresponding CalibHits and CalibMcHits. Those classes have additional attributes to access the position, direction and calibrated hit times. They also provide access to the DU and floor which the hit was registered.

In order to obtain the position, direction, the t0 correction and DU/floor, you need to apply a calibration. KM3Pipe provides the Calibration class to do this for you.

To create a calibration from a detector file:

cal = kp.calib.Calibration(filename="path/to/detector.detx")

To apply the calibration to a set of hits:

calibrated_hits = cal.apply(hits)

That’s it, you will get a CalibHits or CalibMcHits table respectively, with pos_x, pos_y, … and also dir_x, dir_y… and du, floor.

Another, even easier way is to calibrate your file beforehand, using the calibrate command line utility:

calibrate DETXFILE HDF5FILE

If you read in the file with the km3pipe.io.hdf5.HDF5Pump, it will automatically recognise the calibration and use the correct classes.

Sorting of Hits

All HitSeries classes derive from DTypeAttr which implements a very fast sorting using the numpy.argsort method.

Here is an example showing how to sort a dummy hit series with 3 hits:

In [1]: import km3pipe as kp
hi
In [2]: hits = kp.dataclasses.Table.from_template([
   ...:    ...:     [13, 21, 12],
   ...:    ...:     [10, 11, 10],
   ...:    ...:     [3, 1, 2],
   ...:    ...:     [22, 23, 24],
   ...:    ...:     [False, True, True],
   ...:    ...:     23], 'Hits')
   ...:    ...:hits
   ...:
Out[2]:
Hits <class 'km3pipe.dataclasses.Table'>
HDF5 location: /hits (split)
channel_id (dtype: |u1) = [13 21 12]
dom_id (dtype: <u4) = [10 11 10]
time (dtype: <f8) = [3. 1. 2.]
tot (dtype: |u1) = [22 23 24]
triggered (dtype: |u1) = [0 1 1]
group_id (dtype: <u4) = [23 23 23]

In [3]: hits.sorted('time')
Out[3]:
Hits <class 'km3pipe.dataclasses.Table'>
HDF5 location: /hits (split)
channel_id (dtype: |u1) = [21 12 13]
dom_id (dtype: <u4) = [11 10 10]
time (dtype: <f8) = [1. 2. 3.]
tot (dtype: |u1) = [23 24 22]
triggered (dtype: |u1) = [1 1 0]
group_id (dtype: <u4) = [23 23 23]


    In [4]: for h in hits:
   ...:     print(h.time)
   ...:     print(h['tot'])
   ...:
3.0
22
1.0
23
2.0
24

In [5]: for k in hits.sorted('time'):
   ...:     print(h.time)
   ...:     print(h['tot'])
   ...:
2.0
24
2.0
24
2.0