DBManager¶
The DBManager
class provides an easy access to data stored in the KM3NeT
Oracle database and takes care of the whole authentication procedure. It uses
an in-memory cache for the fetched data to reduce network traffic and I/O.
All you need to do is create an instance of the DBManager
class.
Dataformats¶
The database web API uses three different formats: ASCII (CSV like), JSON and
XML. All the ASCII output is parsed in DBManager
and converted to
pandas DataFrames
, which is the de facto standard for representing large
data in Python.
Great benefits of using pandas is its well written documentation, a huge
number of tutorials/books, tens of thousands of questions and answers on
stackoverflow and of course the nice and active scientific community
behind it.
The JSON output of the database web API is wrapped in Python classes to make
your life easier.
DB Authentication¶
There are two ways to create a database connection, either using your KM3NeT database username and password, or via a special permanent session cookie which can be requested after a successful login.
If you create a DBManager
instance without any initialisation arguments,
it will ask you for your username and password and upon successful login
it gives you the option to request a permanent session cookie and stores it
in your KM3Pipe configuration file under ~/.km3net
:
>>> import km3pipe as kp
>>> db = kp.db.DBManager()
Please enter your KM3NeT DB username: tgal
Password:
Request permanent session? (y/n)y
Your ~/.km3net
configuration file should now look something like this:
[DB]
session_cookie = sid=_username_12.34.56.78_504c293fac5b4ddbba2cfc3dc33eaadc
If you ever encounter any issues with the database communication, try deleting
the session_cookie
line in the configuration file and request a new one
as described above.
You can also add your username and DB password to the ~/.km3net
config
file like this:
[DB]
session_cookie = sid=...
username = MYNAME
password = INTERNALPASS
Important note: due to security reasons, the ~/.km3net
configuration file
should not be readable to others. KM3Pipe will warn you and will also display
the command to set the appropriate permission:
chmod 600 ~/.km3net
Retrieving the list of detectors¶
The registered detectors can be access via:
>>> db.detectors
OID SERIALNUMBER LOCATIONID CITY
0 D_DU1CPPM 2 A00070004 Marseille
1 A00350276 3 A00070003 Napoli
2 A00350280 4 A00070005 Bologna
3 D_DU2NAPO 5 A00070003 Napoli
4 D_TESTDET 6 A00070002 Fisciano
5 D_ARCA001 7 A00073795 Italy
6 FR_INFRAS 8 A00073796 France
7 D_DU003NA 9 A00070003 Napoli
8 D_DU004NA 12 A00070003 Napoli
9 D_DU001MA 13 A00070004 Marseille
10 D_ARCA003 14 A00073795 Italy
11 A01495728 21 A00074177 Amsterdam
12 A01495762 22 A00074177 Amsterdam
13 D_BCI0001 23 A00070002 Fisciano
The data you get is a pandas.DataFrame
instance, which represents itself
as an ASCII table.
To demonstrate the table like DataFrame
structure, see the following tiny
example on how to create a list of the cities or look for the OID
given a detector serialnumber:
>>> dts = db.detectors
>>> dts.CITY.drop_duplicates().sort_values()
11 Amsterdam
2 Bologna
4 Fisciano
6 France
5 Italy
0 Marseille
1 Napoli
>>> dts[dts.SERIALNUMBER == 14].OID
10 D_ARCA003
Name: OID, dtype: object
Don’t be confused about the left column, those are the actual indices of the
rows in the original pandas DataFrame
.
CLBs¶
The CLBMap
class is a convenient tool to check the CLB parameters like
UPI, floor, DU or just to find out a base for a given DU:
>>> import km3pipe as kp
>>> clbmap = kp.db.CLBMap("D_ORCA003") # use the det OID
>>> clbmap.base(1)
CLB(
det_oid='D_ORCA003',
floor=0,
du=1,
serial_number=267,
upi='3.4.3.2/V2-2-1/2.267',
dom_id=808476701
)
>>> clbmap.upi['3.4.3.2/V2-2-1/2.267'].dom_id
808476701
>>> clbmap.dom_id[808959411].floor
5
Fun with DOMs¶
Important note: the following method will be deprecated soon and replaced by the `CLBMap` as described in the previous subsection.
To retrieve information about DOMs, the DBManager
provides a handy
DOMContainer
class, which can be access via:
>>> db.doms
<km3pipe.db.DOMContainer object at 0x110daea10>
You can take a look at the docstring of the class using Pythons help
function:
>>> help(db.doms)
class DOMContainer(__builtin__.object)
| Provides easy access to DOM parameters stored in the DB.
|
| Methods defined here:
|
| __init__(self, doms)
|
| clbupi2domid(self, clb_upi, det_id)
| Return DOM ID for given CLB UPI and detector
|
| clbupi2floor(self, clb_upi, det_id)
| Return Floor ID for given CLB UPI and detector
|
| domid2floor(self, dom_id, det_id)
| Return Floor ID for given DOM ID and detector
|
| ids(self, det_id)
| Return a list of DOM IDs for given detector
|
| via_clb_upi(self, clb_upi)
| return DOM for given CLB UPI
|
| via_dom_id(self, dom_id)
| Return DOM for given dom_id
|
| via_omkey(self, omkey, det_id)
| Return DOM for given OMkey (DU, floor)
The most important methods are probablly via_clb_upi
, via_dom_id
and
via_omkey
. All of them will return an instance of DOM
which is
basically a struct, holding the usual DOM information.
The via_omkey
method takes a tuple (DU, floor)
and also requires the
detector OID.
Here are some examples how to use these methods:
>>> a_dom = db.doms.via_omkey((2, 16), "D_ARCA003")
>>> a_dom
DU2-DOM16 - DOM ID: 809548782
DOM UPI: 3.4/CH25H/1.60
CLB UPI: 3.4.3.2/V2-2-1/2.594
DET OID: D_ARCA003
>>> print(a_dom)
DU2-DOM16
>>> a_dom.clb_upi
'3.4.3.2/V2-2-1/2.594'
>>> a_dom.floor
16
>>> a_dom.du
2
>>> another_dom = db.doms.via_clb_upi("3.4.3.2/V2-2-1/2.296")
>>> print(another_dom)
DU2-DOM9
>>> another_dom
DU2-DOM9 - DOM ID: 808951763
DOM UPI: 3.4/CH39H/1.53
CLB UPI: 3.4.3.2/V2-2-1/2.296
DET OID: D_ARCA003
Datalogs¶
This is probably the most interesting part of the database. The datalogs is a meta table which provides access to hundreds of different parameter types.
Parameters¶
The available parameters can be inspected via the ParametersContainer
class
which is – just like the DOMContainer
– automatically instantiated and
accessible as an attribute of the DBManager
:
>>> db.parameters
<km3pipe.db.ParametersContainer object at 0x110d22250>
A quick peek on help(db.parameters)
reveals a few methods and attributes:
>>> help(db.parameters)
class ParametersContainer(__builtin__.object)
| Provides easy access to parameters
|
| Methods defined here:
|
| __init__(self, parameters)
|
| get_converter(self, parameter)
| Generate unit conversion function for given parameter
|
| get_parameter(self, parameter)
| Return a dict for given parameter
|
| unit(self, parameter)
| Get the unit for given parameter
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| names
| A list of parameter names
The names
attribute gives you a list of available parameters:
>>> len(db.parameters.names)
277
>>> db.parameters.names[:5]
['led_model', 'pmt_serialnumber', 'bps_breaker', 'humid',
'pwr_meas[9] power_measurement_12v_lvl']
The above example shows the first 5 parameters out of 277 entries.
If you see a number enclosed by brackets in a parameter name, like
"pwr_meas[9] power_measurement_12v_lvl"
in the list above, it means that
"pwr_meas"
is a parameter-array and the value at index 9
is aliased to
power_measurement_12v_lvl
. The latter name should be used if you want
to retrieve the corresponding data from the DB.
Parameter Units and Value Conversions¶
The ParametersContainer
has three methods to access information about a
given parameter.
The get_converter()
method returns a function to be used to convert
the raw values stored for a given parameter to match the target unit, which
is returned by the unit()
method:
>>> humid_converter = db.parameters.get_converter("humid")
>>> humid_converter(987)
9.870000000000001
>>> db.parameters.unit("humid")
'%'
Retrieving Parameter Data¶
The datalog
method provides an easy way to retrieve data for a given
detector and run or range of runs. It returns a pandas DataFrame
instance:
>>> humid = db.datalog("humid", run=4780, det_id="D_ARCA003")
Database lookup took 3.931s (CPU 0.192s).
>>> type(humid)
<class 'pandas.core.frame.DataFrame'>
The head()
and tail()
methods can be used to get the first or last
rows:
>>> humid.head(3)
RUN UNIXTIME SOURCE_NAME PARAMETER_NAME DATA_VALUE \
0 4780 1478735722766 3.4.3.2/V2-2-1/2.138 humid 3694
1 4780 1478735732768 3.4.3.2/V2-2-1/2.138 humid 3694
2 4780 1478735742766 3.4.3.2/V2-2-1/2.138 humid 3694
DATETIME VALUE
0 2016-11-09 23:55:22.766000+00:00 36.94
1 2016-11-09 23:55:32.768000+00:00 36.94
2 2016-11-09 23:55:42.766000+00:00 36.94
The DATA_VALUE
is the column which holds the recorded data
(the “raw values”). The VALUE
column is automatically added by the
DBManager
– if the parameter has a valid unit and conversion score entry in
the database – by applying the above mentioned get_converter()
method
on the DATA_VALUE
column.
If the data contains a UNIXTIME
column, a DATETIME
field will be added
too, which allows using all the magical date filtering methods.
StreamDS¶
You already learned how to use the DBManager
to connect to the database
and access information. The StreamDS
class is a specific helper, which
connects to the StreamDS (Stream Data Service) of the KM3NeT database web
server interface. The StreamDS is used to retrieve large datasets which could
possibly reach and exceed GB size.
StreamDS
uses the DBManager
to connect to the database and you
instantiate the same way:
>>> import km3pipe as kp
>>> sds = kp.db.StreamDS()
Please enter your KM3NeT DB username: tgal
Password:
Request permanent session? (y/n)y
Notice that you won’t be asked for the password or session if you already
put your credentials into your ~/.km3net
configuration or created a
permanent session before (and your IP has not changed since then).
If you type sds.
and press <TAB>
, you will see a list of available
methods and getters for all available streams. The methods are generated
dynamically, so it is always up to date with the latest web API:
>>> sds.
sds.ahrs( sds.pmt_available_hvtuned_sets(
sds.clbmap( sds.pmt_best_hv_settings(
sds.clbmon( sds.pmt_hv_run_settings(
sds.clbmondomid( sds.pmt_hv_settings(
sds.clbmonpos( sds.pmt_hv_tuning_settings(
sds.clbmonupi( sds.pmtdarkbox(
sds.datalogevents( sds.print_streams(
sds.datalognumbers( sds.runs(
sds.datalogstrings( sds.runsummarynumbers(
sds.detcalibrations( sds.streams
sds.detectors( sds.t0(
sds.dmvars( sds.t0sets(
sds.get( sds.toa(
sds.integration( sds.toashort(
sds.jobs( sds.upi(
sds.mandatory_selectors( sds.vendorhv(
sds.optional_selectors( sds.vendorhvrunsetup(
To get a full list of available streams:
>>> sds.streams
['detectors', 'runs', 'jobs', 'datalognumbers', 'datalogstrings',
'datalogevents', 'vendorhv', 'vendorhvrunsetup', 't0sets', 't0',
'ahrs', 'upi', 'pmtdarkbox', 'dmvars', 'detcalibrations',
'pmt_hv_settings', 'pmt_hv_tuning_settings', 'pmt_hv_run_settings',
'pmt_best_hv_settings', 'pmt_available_hvtuned_sets', 'integration',
'clbmon', 'clbmonupi', 'clbmondomid', 'clbmonpos', 'clbmap', 'toa',
'toashort', 'runsummarynumbers']
To print all streams including their selectors and data formats, use the
sds.print_streams()
function:
>>> sds.print_streams()
detectors
Shows all the detectors, optionally selecting by site oid or city.
available formats: txt
mandatory selectors: -
optional selectors: locationid,city
runs
Shows all runs for a detector (mandatory selection by detid or serialnumber). Optionally, a single run may be specified.
available formats: txt
mandatory selectors: detid
optional selectors: run
jobs
Shows all detector run jobs for a detector within a minimum and maximum Unix time (all mandatory selections). Optionally, selections may consider priority, runsetupid, oid.
available formats: txt
mandatory selectors: detid,unixmintime,unixmaxtime
optional selectors: priority,runsetupid,oid,localid
...
...
...
If you are using ipython
(recommended), you can get a quick help if you
type for example sds.vendorhv?
to see what the vendorhv
stream does and
which selectors it needs (if you are using the plain python
REPL,
type help(sds.vendorhv)
instead. Also notice that some completion features
are only supported for Python 3.3+ (you should update to Python 3.6 anyways…):
>>> sds.vendorhv?
Signature: sds.vendorhv(detid, *, pmtserial)
Docstring: Shows vendor-suggested HV for a detector (mandatory selection by detid or serialnumber). Optionally, a single PMT may be specified.
File: ~/Dev/km3pipe/km3pipe/db.py
Type: function
As you can see, the Signature
indicates that detid
is mandatory and
the keyword(s) after the *
are optional (in this case pmtserial
).
Let’s retrieve some data:
>>> sds.vendorhv(detid=14)
DUID FLOORID CABLEPOS PMTSERIAL PMT_SUPPLY_VOLTAGE
0 1 1 0 1838 -1010
1 2 1 0 704 -1080
2 3 1 0 5586 -1030
3 2 1 1 6461 -990
4 3 1 1 6483 -1100
5 1 1 1 4944 -930
That’s it. You always get a Pandas DataFrame
back. Have fun!
The streamds
CLI¶
There is also a command line utility called streamds
, which can be used to
interact with the database directly from the shell:
$ streamds --help
Access the KM3NeT StreamDS DataBase service.
Usage:
streamds
streamds list
streamds upload CSV_FILE
streamds info STREAM
streamds get STREAM [PARAMETERS...]
streamds (-h | --help)
streamds --version
Options:
STREAM Name of the stream.
CSV_FILE Tab separated data for the runsummary tables.
PARAMETERS List of parameters separated by space (e.g. detid=29).
-h --help Show this screen.
Uploading “runsummarynumbers”¶
You can use the streamds upload CSV_FILE
command to upload data to
the “runsummarynumbers” meta table of the KM3NeT database. Please discuss
in advance any new types of parameters with database experts and create a
wiki page which describes them in detail.
The required columns are run
, det_id
and source
. The source
column is a free string-type column. It is recommended to use the DOM ID if
you have parameters which refer to DOMs. If you have a column which refers to
the whole run, use the string "run"
in the source column e.g. for a
parameter which refers to a DU, you can set it to "du1"
etc.
Here is an example of a CSV file:
det_id run_id source n_active_doms highest_rate
29 523 whole_run 18 230042
29 523 du1 3 123000
29 524 whole_run 17 500023
Please note that the whole file will be rejected if there is even a single row of data which is already present in the database.