DBManager

The DBManager class provides an easy access to data stored in the KM3NeT Oracle database and takes care of the whole authentication procedure. It uses an in-memory cache for the fetched data to reduce network traffic and I/O. All you need to do is create an instance of the DBManager class.

Dataformats

The database web API uses three different formats: ASCII (CSV like), JSON and XML. All the ASCII output is parsed in DBManager and converted to pandas DataFrames, which is the de facto standard for representing large data in Python. Great benefits of using pandas is its well written documentation, a huge number of tutorials/books, tens of thousands of questions and answers on stackoverflow and of course the nice and active scientific community behind it. The JSON output of the database web API is wrapped in Python classes to make your life easier.

DB Authentication

There are two ways to create a database connection, either using your KM3NeT database username and password, or via a special permanent session cookie which can be requested after a successful login.

If you create a DBManager instance without any initialisation arguments, it will ask you for your username and password and upon successful login it gives you the option to request a permanent session cookie and stores it in your KM3Pipe configuration file under ~/.km3net:

>>> import km3pipe as kp
>>> db = kp.db.DBManager()
Please enter your KM3NeT DB username: tgal
Password:
Request permanent session? (y/n)y

Your ~/.km3net configuration file should now look something like this:

[DB]
session_cookie = sid=_username_12.34.56.78_504c293fac5b4ddbba2cfc3dc33eaadc

If you ever encounter any issues with the database communication, try deleting the session_cookie line in the configuration file and request a new one as described above.

You can also add your username and DB password to the ~/.km3net config file like this:

[DB]
session_cookie = sid=...
username = MYNAME
password = INTERNALPASS

Important note: due to security reasons, the ~/.km3net configuration file should not be readable to others. KM3Pipe will warn you and will also display the command to set the appropriate permission:

chmod 600 ~/.km3net

Retrieving the list of detectors

The registered detectors can be access via:

>>> db.detectors
          OID  SERIALNUMBER LOCATIONID       CITY
0   D_DU1CPPM             2  A00070004  Marseille
1   A00350276             3  A00070003     Napoli
2   A00350280             4  A00070005    Bologna
3   D_DU2NAPO             5  A00070003     Napoli
4   D_TESTDET             6  A00070002   Fisciano
5   D_ARCA001             7  A00073795      Italy
6   FR_INFRAS             8  A00073796     France
7   D_DU003NA             9  A00070003     Napoli
8   D_DU004NA            12  A00070003     Napoli
9   D_DU001MA            13  A00070004  Marseille
10  D_ARCA003            14  A00073795      Italy
11  A01495728            21  A00074177  Amsterdam
12  A01495762            22  A00074177  Amsterdam
13  D_BCI0001            23  A00070002   Fisciano

The data you get is a pandas.DataFrame instance, which represents itself as an ASCII table.

To demonstrate the table like DataFrame structure, see the following tiny example on how to create a list of the cities or look for the OID given a detector serialnumber:

>>> dts = db.detectors
>>> dts.CITY.drop_duplicates().sort_values()
11    Amsterdam
2       Bologna
4      Fisciano
6        France
5         Italy
0     Marseille
1        Napoli
>>> dts[dts.SERIALNUMBER == 14].OID
10    D_ARCA003
Name: OID, dtype: object

Don’t be confused about the left column, those are the actual indices of the rows in the original pandas DataFrame.

CLBs

The CLBMap class is a convenient tool to check the CLB parameters like UPI, floor, DU or just to find out a base for a given DU:

>>> import km3pipe as kp
>>> clbmap = kp.db.CLBMap("D_ORCA003")  # use the det OID
>>> clbmap.base(1)
CLB(
    det_oid='D_ORCA003',
    floor=0,
    du=1,
    serial_number=267,
    upi='3.4.3.2/V2-2-1/2.267',
    dom_id=808476701
)
>>> clbmap.upi['3.4.3.2/V2-2-1/2.267'].dom_id
808476701
>>> clbmap.dom_id[808959411].floor
5

Fun with DOMs

Important note: the following method will be deprecated soon and replaced by the `CLBMap` as described in the previous subsection.

To retrieve information about DOMs, the DBManager provides a handy DOMContainer class, which can be access via:

>>> db.doms
<km3pipe.db.DOMContainer object at 0x110daea10>

You can take a look at the docstring of the class using Pythons help function:

>>> help(db.doms)
class DOMContainer(__builtin__.object)
 |  Provides easy access to DOM parameters stored in the DB.
 |
 |  Methods defined here:
 |
 |  __init__(self, doms)
 |
 |  clbupi2domid(self, clb_upi, det_id)
 |      Return DOM ID for given CLB UPI and detector
 |
 |  clbupi2floor(self, clb_upi, det_id)
 |      Return Floor ID for given CLB UPI and detector
 |
 |  domid2floor(self, dom_id, det_id)
 |      Return Floor ID for given DOM ID and detector
 |
 |  ids(self, det_id)
 |      Return a list of DOM IDs for given detector
 |
 |  via_clb_upi(self, clb_upi)
 |      return DOM for given CLB UPI
 |
 |  via_dom_id(self, dom_id)
 |      Return DOM for given dom_id
 |
 |  via_omkey(self, omkey, det_id)
 |      Return DOM for given OMkey (DU, floor)

The most important methods are probablly via_clb_upi, via_dom_id and via_omkey. All of them will return an instance of DOM which is basically a struct, holding the usual DOM information. The via_omkey method takes a tuple (DU, floor) and also requires the detector OID. Here are some examples how to use these methods:

>>> a_dom = db.doms.via_omkey((2, 16), "D_ARCA003")
>>> a_dom
DU2-DOM16 - DOM ID: 809548782
   DOM UPI: 3.4/CH25H/1.60
   CLB UPI: 3.4.3.2/V2-2-1/2.594
   DET OID: D_ARCA003

>>> print(a_dom)
DU2-DOM16
>>> a_dom.clb_upi
'3.4.3.2/V2-2-1/2.594'
>>> a_dom.floor
16
>>> a_dom.du
2

>>> another_dom = db.doms.via_clb_upi("3.4.3.2/V2-2-1/2.296")
>>> print(another_dom)
DU2-DOM9
>>> another_dom
DU2-DOM9 - DOM ID: 808951763
   DOM UPI: 3.4/CH39H/1.53
   CLB UPI: 3.4.3.2/V2-2-1/2.296
   DET OID: D_ARCA003

Datalogs

This is probably the most interesting part of the database. The datalogs is a meta table which provides access to hundreds of different parameter types.

Parameters

The available parameters can be inspected via the ParametersContainer class which is – just like the DOMContainer – automatically instantiated and accessible as an attribute of the DBManager:

>>> db.parameters
<km3pipe.db.ParametersContainer object at 0x110d22250>

A quick peek on help(db.parameters) reveals a few methods and attributes:

>>> help(db.parameters)
class ParametersContainer(__builtin__.object)
 |  Provides easy access to parameters
 |
 |  Methods defined here:
 |
 |  __init__(self, parameters)
 |
 |  get_converter(self, parameter)
 |      Generate unit conversion function for given parameter
 |
 |  get_parameter(self, parameter)
 |      Return a dict for given parameter
 |
 |  unit(self, parameter)
 |      Get the unit for given parameter
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  names
 |      A list of parameter names

The names attribute gives you a list of available parameters:

>>> len(db.parameters.names)
277
>>> db.parameters.names[:5]
['led_model', 'pmt_serialnumber', 'bps_breaker', 'humid',
'pwr_meas[9] power_measurement_12v_lvl']

The above example shows the first 5 parameters out of 277 entries. If you see a number enclosed by brackets in a parameter name, like "pwr_meas[9] power_measurement_12v_lvl" in the list above, it means that "pwr_meas" is a parameter-array and the value at index 9 is aliased to power_measurement_12v_lvl. The latter name should be used if you want to retrieve the corresponding data from the DB.

Parameter Units and Value Conversions

The ParametersContainer has three methods to access information about a given parameter. The get_converter() method returns a function to be used to convert the raw values stored for a given parameter to match the target unit, which is returned by the unit() method:

>>> humid_converter = db.parameters.get_converter("humid")
>>> humid_converter(987)
9.870000000000001
>>> db.parameters.unit("humid")
'%'

Retrieving Parameter Data

The datalog method provides an easy way to retrieve data for a given detector and run or range of runs. It returns a pandas DataFrame instance:

>>> humid = db.datalog("humid", run=4780, det_id="D_ARCA003")
Database lookup took 3.931s (CPU 0.192s).
>>> type(humid)
<class 'pandas.core.frame.DataFrame'>

The head() and tail() methods can be used to get the first or last rows:

>>> humid.head(3)
    RUN       UNIXTIME           SOURCE_NAME PARAMETER_NAME  DATA_VALUE  \
0  4780  1478735722766  3.4.3.2/V2-2-1/2.138          humid        3694
1  4780  1478735732768  3.4.3.2/V2-2-1/2.138          humid        3694
2  4780  1478735742766  3.4.3.2/V2-2-1/2.138          humid        3694

                          DATETIME  VALUE
0 2016-11-09 23:55:22.766000+00:00  36.94
1 2016-11-09 23:55:32.768000+00:00  36.94
2 2016-11-09 23:55:42.766000+00:00  36.94

The DATA_VALUE is the column which holds the recorded data (the “raw values”). The VALUE column is automatically added by the DBManager – if the parameter has a valid unit and conversion score entry in the database – by applying the above mentioned get_converter() method on the DATA_VALUE column. If the data contains a UNIXTIME column, a DATETIME field will be added too, which allows using all the magical date filtering methods.

StreamDS

You already learned how to use the DBManager to connect to the database and access information. The StreamDS class is a specific helper, which connects to the StreamDS (Stream Data Service) of the KM3NeT database web server interface. The StreamDS is used to retrieve large datasets which could possibly reach and exceed GB size.

StreamDS uses the DBManager to connect to the database and you instantiate the same way:

>>> import km3pipe as kp
>>> sds = kp.db.StreamDS()
Please enter your KM3NeT DB username: tgal
Password:
Request permanent session? (y/n)y

Notice that you won’t be asked for the password or session if you already put your credentials into your ~/.km3net configuration or created a permanent session before (and your IP has not changed since then).

If you type sds. and press <TAB>, you will see a list of available methods and getters for all available streams. The methods are generated dynamically, so it is always up to date with the latest web API:

>>> sds.
sds.ahrs(                        sds.pmt_available_hvtuned_sets(
sds.clbmap(                      sds.pmt_best_hv_settings(
sds.clbmon(                      sds.pmt_hv_run_settings(
sds.clbmondomid(                 sds.pmt_hv_settings(
sds.clbmonpos(                   sds.pmt_hv_tuning_settings(
sds.clbmonupi(                   sds.pmtdarkbox(
sds.datalogevents(               sds.print_streams(
sds.datalognumbers(              sds.runs(
sds.datalogstrings(              sds.runsummarynumbers(
sds.detcalibrations(             sds.streams
sds.detectors(                   sds.t0(
sds.dmvars(                      sds.t0sets(
sds.get(                         sds.toa(
sds.integration(                 sds.toashort(
sds.jobs(                        sds.upi(
sds.mandatory_selectors(         sds.vendorhv(
sds.optional_selectors(          sds.vendorhvrunsetup(

To get a full list of available streams:

>>> sds.streams
['detectors', 'runs', 'jobs', 'datalognumbers', 'datalogstrings',
 'datalogevents', 'vendorhv', 'vendorhvrunsetup', 't0sets', 't0',
 'ahrs', 'upi', 'pmtdarkbox', 'dmvars', 'detcalibrations',
 'pmt_hv_settings', 'pmt_hv_tuning_settings', 'pmt_hv_run_settings',
 'pmt_best_hv_settings', 'pmt_available_hvtuned_sets', 'integration',
 'clbmon', 'clbmonupi', 'clbmondomid', 'clbmonpos', 'clbmap', 'toa',
 'toashort', 'runsummarynumbers']

To print all streams including their selectors and data formats, use the sds.print_streams() function:

>>> sds.print_streams()
detectors
Shows all the detectors, optionally selecting by site oid or city.
  available formats:   txt
  mandatory selectors: -
  optional selectors:  locationid,city

runs
Shows all runs for a detector (mandatory selection by detid or serialnumber). Optionally, a single run may be specified.
  available formats:   txt
  mandatory selectors: detid
  optional selectors:  run

jobs
Shows all detector run jobs for a detector within a minimum and maximum Unix time (all mandatory selections). Optionally, selections may consider priority, runsetupid, oid.
  available formats:   txt
  mandatory selectors: detid,unixmintime,unixmaxtime
  optional selectors:  priority,runsetupid,oid,localid
...
...
...

If you are using ipython (recommended), you can get a quick help if you type for example sds.vendorhv? to see what the vendorhv stream does and which selectors it needs (if you are using the plain python REPL, type help(sds.vendorhv) instead. Also notice that some completion features are only supported for Python 3.3+ (you should update to Python 3.6 anyways…):

>>> sds.vendorhv?
Signature: sds.vendorhv(detid, *, pmtserial)
Docstring: Shows vendor-suggested HV for a detector (mandatory selection by detid or serialnumber). Optionally, a single PMT may be specified.
File:      ~/Dev/km3pipe/km3pipe/db.py
Type:      function

As you can see, the Signature indicates that detid is mandatory and the keyword(s) after the * are optional (in this case pmtserial).

Let’s retrieve some data:

>>> sds.vendorhv(detid=14)
  DUID  FLOORID  CABLEPOS  PMTSERIAL  PMT_SUPPLY_VOLTAGE
  0        1        1         0       1838               -1010
  1        2        1         0        704               -1080
  2        3        1         0       5586               -1030
  3        2        1         1       6461                -990
  4        3        1         1       6483               -1100
  5        1        1         1       4944                -930

That’s it. You always get a Pandas DataFrame back. Have fun!

The streamds CLI

There is also a command line utility called streamds, which can be used to interact with the database directly from the shell:

$ streamds --help
Access the KM3NeT StreamDS DataBase service.

Usage:
    streamds
    streamds list
    streamds upload CSV_FILE
    streamds info STREAM
    streamds get STREAM [PARAMETERS...]
    streamds (-h | --help)
    streamds --version

Options:
    STREAM      Name of the stream.
    CSV_FILE    Tab separated data for the runsummary tables.
    PARAMETERS  List of parameters separated by space (e.g. detid=29).
    -h --help   Show this screen.

Uploading “runsummarynumbers”

You can use the streamds upload CSV_FILE command to upload data to the “runsummarynumbers” meta table of the KM3NeT database. Please discuss in advance any new types of parameters with database experts and create a wiki page which describes them in detail.

The required columns are run, det_id and source. The source column is a free string-type column. It is recommended to use the DOM ID if you have parameters which refer to DOMs. If you have a column which refers to the whole run, use the string "run" in the source column e.g. for a parameter which refers to a DU, you can set it to "du1" etc.

Here is an example of a CSV file:

det_id  run_id  source     n_active_doms highest_rate
29      523     whole_run  18            230042
29      523     du1        3             123000
29      524     whole_run  17            500023

Please note that the whole file will be rejected if there is even a single row of data which is already present in the database.