CEP 2 - Event Structure#
Status: accepted
Discussion: [#2304]
Date accepted: 2023-10-26
Last revised: 2023-09-08
Author: Maximilian Linhoff
Created: 2023-04-04
Abstract#
Currently, the hierarchy of the ctapipe ArrayEventContainer
container has the data
levels first and then each data level has a Map
.tel
for telescope specific
information in subcontainers.
This CEP proposes to change this structure to have a container TelescopeEventContainer
as parent for all telescope-wise information that then contains the data levels.
This has the following advantages:
At lower data levels, many processing steps can be performed independently for each telescope event, which is easier if all data from a single telescope is in a single container.
The current proposed ACADA-DPPS ICD fore-sees event and monitoring files per telescope. This maps nicely to readers filling a
TelescopeEventContainer
which are then joined together with array level information into anArrayEventContainer
.
Proposed New Structure#
This CEP proposes to change the current layout of ArrayEventContainer
from having multiple
data levels each with a Map
containing telescope-wise data to a structure
where each ArrayEventContainer
is composed of one to many TelescopeEvents
containing
all telescope-wise information for all data levels.
The ArrayEventContainer
should also be renamed to SubarrayEventContainer
, to match with other naming
patterns in ctapipe, such as the SubarrayDescription
, making it clear that the array is split
into multiple subarrays, each observing their own observation block.
The main structure after the change will look like this (Container-suffix left out for readability):
SubarrayEvent
- index: SubarrayEventIndex
- simulation: SimulatedShower
- dl0: DL0Subarray
- dl1: DL1Subarray
- dl2: DL2Subarray
- dl3: DL3Subarray
- tel: Map[tel_id -> TelescopeEvent]
TelescopeEvent
- index: TelescopeEventIndex
- simulation: TelescopeSimulation
- r0: R0Telescope
- r1: R1Telescope
- dl0: DL0Telescope
- dl1: DL1Telescope
- dl2: DL2Telescope
- dl3: DL3Telescope
Which each data level container having specific fields and/or subcontainers including monitoring information (interpolated / chosen for that specific event).
Advantages of the New Structure#
The new proposed scheme makes it easier to parallelize over array events and move loops
over telescopes out of code paths that only deal with a single telescope.
E.g. in the CameraCalibrator
, ImageProcessor
and more classes,
we currently have to provide the ArrayEventContainer
,
so that higher data levels can be filled from lower data levels, although only one telescope
is processed at a time.
In the current scheme (simplified from ctapipe-process) it looks like this:
for array_event in source:
calibrator(array_event)
# calibrator internally has two hidden loops like this:
# for tel_id, r1 in array_event.r1.tel.items():
# calibrate r1 to dl0
# for tel_id, dl0 in array_event.dl0.tel.items():
# calibrate dl0 to dl1
image_processor(array_event)
# image processor also has an internal loop over the telescope events
# for tel_id, dl1 in array_event.dl1.tel.items():
# image cleaning and parametrization
shower_processor(array_event)
This looks simple, but there are hidden loops over the telescopes in both the CameraCalibrator
and the ImageProcessor
here, although both of these do not access any subarray-wide data at all.
Using the new structure, these classes will get a single TelescopeEventContainer
and the loop
can to be moved outside those classes to a single place:
for array_event in source:
for telescope_event in array_event.tel.values():
calibrator(telescope_event)
image_processor(telescope_event)
shower_processor(array_event)
Clearly separating the components working on the telescope level from the ones working on the subarray level.
By removing the hidden loops in the telescope level components, it now would also be easy to parallelize the processing of telescope events:
def process_telescope_event(telescope_event):
calibrator(telescope_event)
image_processor(telescope_event)
with ThreadPool(8) as pool:
for array_event in source:
pool.map(proces_telescope_events, array_event.tel.values())
shower_processor(array_event)
It also makes writing EventSource
implementations simpler,
as reading data of different telescopes might require opening multiple files (as e.g. foreseen for the CTAO DL0 files).
Each of those files could read the corresponding information into independent TelescopeEvent
instances, that are then joined into single SubarrayEvent
.
Since sim_telarray
files use the same organization, it might also simplify some code in the SimTelEventSource
.
For code directly accessing information from the array event, this mostly means inverting the order of .tel
and the data level.
Before: event.dl1.tel[1].image
,
After: event.tel[1].dl1.image
Before:
hillas_dicts = {
tel_id: dl1.parameters.hillas
for tel_id, dl1 in event.dl1.items()
if all(self.quality_query(parameters=dl1.parameters))
}
After:
hillas_dicts = {
tel_id: tel_event.dl1.parameters.hillas
for tel_id, tel_event in event.tel.items()
if all(self.quality_query(parameters=tel_event.dl1.parameters))
}
Or in our loops, code like this:
for tel_id in event.trigger.tels_with_trigger:
dl0 = event.dl0.tel[tel_id]
dl1 = event.dl1.tel[tel_id]
# do something with dl0 and dl1
will become:
for telescope_event in event.tel.values():
dl0 = telescope_event.dl0
dl1 = telescope_event.dl1
# do something with dl0 and dl1
which is more idiomatic python and does not require repeated lookup via tel_id.
Previous Discussions#
Previous discussion of this issue has occurred over multiple issues, most importantly #1165, but also in #1301, and 722.
Advantages of the Old Structure#
By having the data level first in the hierarchy, it is easier to drop certain data levels for all telescopes.