Design of DAQ Data Flow - indico.mpp.mpg.de · Near detector E-hut DAQ server room Rocket IO over...

23
1 Design of DAQ Data Flow Ryosuke Itoh KEK

Transcript of Design of DAQ Data Flow - indico.mpp.mpg.de · Near detector E-hut DAQ server room Rocket IO over...

1

Design of DAQ Data Flow

Ryosuke ItohKEK

2

FEdig

FEdig

FEdig

FEdig

FEdig

tx

tx

tx

tx

tx

rx

rx

rx

rx

R/OPC

R/OPC

R/OPC

R/OPC

Even

t Build

er 1

~300 R/O boards(COPPERs)~0.5M chan.

HLT farms~O(10) units of~400 cores/unit

~30 R/O PCs

Near detector E-hut DAQ server room

Rocket IOover fiber

CDC

SVD

PID

ECL

KLM

Eve

nt B

uild

er 2

PXD FEdig PXD readout box

HLT distributor

recorder

recorder

recorder

recorder

RAID~10 units

......

.....

Linux CPUsData processing in DAQ

Belle2linkBelle2link

3

PMCProcessor

Trigger

GenericPMC slot

FINESSE

FINESSE

FINESSE

FINESSE

On-board Ether

Form factor = VME 9U

Digitizer cards(Belle2link recv.)

1000Base-T port x 2

CPU (Linux)ATOM 1.6GHzMemory : 512MB

COPPER

Belle2link

to event builder

Online processorOnline processor

Readout FIFOReadout FIFO

Belle2link Recv.Belle2link Recv.

COPPER

Network

4

digitizerdata

reductionL1

FIFO

trans-mittercore

FPGA on detector readout card

signalreceiver

FINESSE

COPPER

opticalfibers

Belle2link“Remote FINESSE”

Control Registers

dataslow control

- In the FPGA on detector front-end card, “virtual” FINESSE” is implemented, and it talks with “Belle2link transmitter core”.

- In COPPER, Belle2link receiver(HSLB) is implemented instead of digitizer FINESSE, and connected to front-end card via optical fibers.

- The receiver “remote controls” the “virtual FINESSE” (slow control) and receives the data stream via optical fibers as if the remote FINESSE is implemented on the COPPER.

Belle2link and “remote” FINESSE

FPGA

5

- Raw data are stored in ROOT objects by COPPER CPU.- Objects are streamed and transferred between nodes using “B2Socket” class

basf2

format

reduct.

ba

sf2

format

monitor

basf2

track

clust.

b2link recv

Object

pid.

sel.

Object

Eve

nt B

uild

er

COPPERs(readout modules)

evtsrv.

Object

out.srv.

R/O PCs HLT nodesba

sf2

basf2

B2Socket

rx tx

B2Socket

basf2

rx tx

ringbuffer

B2Socket B2Socket

tx

rx rx

tx

rx

tx

unifiedframework

Data flow : Belle2 software framework(basf2) + OO data transfer

6

inputmodule

mod. 1 mod. 2outputmodule................

event nevent 1..... event n

event 1.....

DataStore (Object Manager)

input ROOT file output ROOT file

basf2

chain ofmodules (path)

inputmod.

tx

RingBuffer

rx module chain tx

RingBuffer

rx module chain tx

rx module chain tx

outputmod.

rx

input path output path

inputprocess

event processes

outputprocess

basf2 : Belle II's universal software framework

* Built-in parallel event processing utilizing multi-core CPU

7

Ev3

Ev2

- To transport event data between nodes, objects in DataStore are once streamed in a byte stream record (EvtMessage) event by event.

- The EvtMessage is transferred via network socket connection (B2Socket).

- The received EvtMessage is destreamed and restored in DataStore of received node.

- A ring buffer is used for the event distribution to multiple nodes. → automatic load balancing

Streamingby

TMessage

Event 1ROOTobjects

RingBufferon Linux IPC

shared memory

Restoreobjects

Restoreobjects

Restoreobjects

Event 3ROOTobjects

Event 2ROOTobjects

Event 1ROOTobjects

tx

tx

tx

rx

rx

rx

event source node

receiver nodes

basf2

basf2

B2Socket

Object transport : Basics of data flow in Belle II DAQ

EvtMessage

8

Data processing on COPPER CPU

basf2input module

formatting(ROOTize)

B2linkrecv.

FINESSE

output moduleEvtMessage

datared.

mon.

CPR-NSMdaemon

modules

alert

Controlnetwork

COPPER

B2Socket

- Format raw data read from Belle2link to ROOT object.- Perform data reduction and monitoring if necessary.- Transfer raw data object to Readout PC through B2Socket.

9

Event Building

datared. mon.

EVB-NSMdaemon

alert

B2Socket

spawn

rx1 rx2 rxn....

COPPERs

tx

B2Socket

Readout PC/EVB Output Node

DataStore

basf2 kernel

basf2

EVB/HLT

* Event building is performed in the “DataStore” object manager of basf2.* The raw data objects from COPPERs are registered in DataStore which manages the list of objects, and then streamed again and transferred to next node over B2Socket.

10

DataStore 1

Parallel Event Building utilizing multicore CPU- Sequential receiving of event fragments from B2Socket is a possible bottleneck because of CPU consumption for object destreaming.- “Parallel” receiving of event fragments utilizing the built-in parallel processing function of basf2 is being developed.

rx1 rx2 rx4....

DataStore 2

rxa rxb rxf....

DataStore m

rxj rxk rxn....

RingBuffer

RingBuffer

RingBuffer

process/core

tx

tx

tx

DataStore n

rx

rxrx

datared. mon. tx

B2Socket

EVB/HLT

process/core

LinuxIPC

B2

So

cke

t

basf2

11

rx

txtx

tx

rx tx

rxrx

rx

tx

basf2

Worker Nodes(~20 nodes)

EventSeparator

ring bufferrx event receiver

event sender

EventMerger

B2Socket

multicore

tx

Eve

nt Bu

ilder

one HLT unit

De

tecto

r R/O

Eve

nt B

uild

er 2

PixelDetector

R/O

x O(10) HLT units

Histomemory

DQM server

High Level Trigger (HLT)- Unit structure (O(10)) * to reduce the number of output port of event builder * to keep up with the gradial luminosity increase * fault-tolerant : each unit is completely independent- Based on the parallel processing technology developed for basf2

12

Prompt Reco Design

HLT (Real Time)

Prompt Reco

Fo

rma

ttin

g

Tra

ck(S

VD

+C

DC

)

Eve

nt

Bu

ilder

Ca

lo.

Clu

st.

PID

Ph

ysic

s S

kim

Tra

ck(S

VD

+C

DC

)

Ca

lo.

Clu

st.

PID

Ve

rte

xin

g(P

XD

+S

VD

+C

DC

)

PXD R/O

DS

T p

rod

.

OnlineStorage

Leve

l 4 S

el.

PX

D-f

ind

er

Ve

rte

xin

g(P

XD

+S

VD

+C

DC

)

Bhabha+µµscaledskim

PXD

EventBuilder

2

PX

D/V

TX

DQ

M

Express Reco(Real Time)

DQMHistogramStorage

add

ition

alpr

oc.

ConstantsMaking

Run-by-runCalibration constants

(in 36 hours aft.run)“Frozen”

Calibration constants

for alignment

DST(48 hoursaft. run)

updated exp by exp

rawdatahisto

main data flow

histogramsscaled skim

calibration constants

Recnode

13

- Prompt Reco is still marginal. * Basically computing issue (not in DAQ territory) and it strongly couples with DST production strategy. * Computing people are very busy with GRID business and seem to have less interest for now........

- Express Reco is already in DAQ territory and we will surely have it for DQM (mainly for PXD, but also for other purposes)

* PromptReco processing is quite important especially for early availability of DST with PXD, and also for software trigger strategy (my opinion) * If you think so, please support.

14

Streaming raw data

* Raw event data have to be stored in “DataStore” as an object so as to be managed by basf2.

* To transfer the raw data to different node, the DataStore has to be streamed (serialized) and destreamed(deserialized).

* Streaming using ROOT is reported to be CPU consuming. -> could be an issue for COPPER CPU ([email protected]).

* Use the simplest structure of raw data : “variable sized array of integer” - Overhead of ROOT streaming can be minimized. - If still slow, optimized streamer (=handwritten streamer) will be implemented. - Data access through accessing class (just like Belle's TdcUnpacker class)

15

Raw data format

- Raw data from detectors (except PXD) are first received by COPPERs through Belle2link.

- The data are treated as a variable-sized array of integer(32bit) encapsulated in COPPER frame with FINESSE headers/trailers.

- The format inside the array is up to each detector group which depends on the digitizer of the readout electronics.

- The array is stored in RawCOPPER object as is.

CO

PP

ER

hdr

CO

PP

ER

tra

iler

FIN

ES

SE

hd

r

FIN

ES

SE

tra

iler

FIN

ES

SE

tra

iler

FIN

ES

SE

hd

r

FIN

ES

SE

hdrRawdata

from B2link 1

Rawdatafrom

B2link 2.......

Up to 4 x B2link data

16

Performance test of streaming

Environment to run basf2 on COPPER Test bench at Tsukuba B3 floor

Readout PC (Xeon@3G)

CO

PP

ER

CO

PP

ER

Dataflow SW Slow Ctr SW

network boot

COPPER3 +New PrPMC CPU

GbE connection

- Scientific Linux 5.7 (network boot) + Belle2 software library- Raw data management class + dummy raw data generator- Data transfer to readout PC over socket with streaming/destreaming

17

Environment to run basf2 on COPPER Test bench at Tsukuba B3 floor

Readout PC (Xeon@3G)

CO

PP

ER

CO

PP

ER

Dataflow SW Slow Ctr SW

network boot

COPPER3 +New PrPMC CPU

GbE connection

- Scientific Linux 5.7 (network boot) + Belle2 software library- Raw data management class + dummy raw data generator- Data transfer to readout PC over socket with streaming/destreaming Performance Measurement:

* Event size : 500 bytes/board assumed * Required to send raw data at > 30kHzResults: a) ROOT streamer : ~5kHz <- veeery sloooow b) Hand-written streamer : ~20kHz * not tuned (streaming code/network parameters) and improvement is expected

Performance test of streaming

18

Raw data after event building

- Event building is performed using DataStore::StoreArray. A collection of RawCOPPER objects are stored in the StoreArray.

ex. RawCDC class * Inherited from RawCOPPER class * Contains raw data from one COPPER module.The data from COPPERs of a subdetector are collected in StoreArray (event building).

StoreArray<RawCDC> rawcdcArrayfor (int i=0; i<ncopper;i++ ) {  new (rawcdcArray[i]) RawCDC(....)  .......}

Event Building

19

PXD

COPPERCOPPERCOPPERR/OPC

COPPERCOPPERCOPPERR/OPC

COPPERCOPPERCOPPERR/OPC

EventBuilder 1

HLT(RFARM)

1

HLT(RFARM)

1

HLT(RFARM)

1

PXD readout

processorSwitch

track params +event tag

EventBuilder

2

RecordingRAID

RecordingRAID

2

RecordingRAID

Noise reduction by track assoc. + rate reduction by HLT sel.

detectorsexcept Pixel

~80kB/ev~30kHz

~1MB/ev~30kHz

~80kB/ev~10kHz

~100kB/ev~10kHz

Size reduction=1/10

Rate reduction = 1/3

Size red. by formatting

~100kB/ev~30kHz

PromptReco

~180kB/ev~5kHz= ~0.9GB/s@PReco

Expected data rate/size reduction for L1 trigger rate=30kHzwith loose HLT trigger + Final trigger at PromptReco

Rate reduction = 1/2

~180kB/ev~10kHz= ~1.8GB/s@ RAID

20

Backup Slides

21

Assumed L1 rate = 30kHz (maximum of average)

Estimated event size and bandwidth

to be fixed

22

Loosened

HLT (Real Time)F

orm

att

ing

Tra

ck(S

VD

+C

DC

)

Eve

nt

Bui

lder

Ca

lo.

Clu

st.

PID

Ph

ysic

s S

kim

PXD R/OOnlineStorage

Leve

l 4 S

el.

PX

D-f

ind

er

PXD

EventBuilder

2

Full event selection = 1/5 reduction

Prompt Reco

Tra

ck(S

VD

+C

DC

)

Cal

o.

Clu

st.

PID

Ve

rte

xin

g(P

XD

+S

VD

+C

DC

)

DS

T p

rod

.RoI

No Selection

HLT (Real Time)

Fo

rma

ttin

g

Tra

ck(S

VD

+C

DC

)

Eve

nt

Bui

lde

r

Ca

lo.

Clu

st.

PID

Ro

ugh

Ski

m

PXD R/OOnlineStorage

Leve

l 4 S

el.

PX

D-f

inde

rPXD

EventBuilder

2

Rough event selection = 1/3 reduction

Prompt Reco

Tra

ck(S

VD

+C

DC

)

Ca

lo.

Clu

st.

PID

Ve

rte

xin

g(P

XD

+S

VD

+C

DC

)

DS

T p

rod.

RoI

Fine selection ~ 1/2

Ph

ysic

s S

kim

[New idea]

[Previous Design]

offline storage

23

5. Integration of Pixel Detector

FEdig tx

tx rx

R/OPC

~300 COPPERs~0.5M chan.~30 R/O PCs

EventBuilder

2CDC

SVD

PXD FEdig

PXD readout box(ATCA)

HLT distributor

Even

t Builde

r 1

100kB/evrecorder

recorder

200KB/ev10kHzFE

dig

1MB/ev

- HLT performs special low momentum tracking and obtain “RoI” in PXD surface for reconstructed tracks.- “RoI” is sent to PXD readout box for HLT-taken events. - PXD box associate PXD-hits with RoI by FPGA processing and only associated hits are sent to 2nd EVB. -> ~1/10 reduction (data size) + 1/3~1/5 (rate) expected.

RoI

30kHz

6-10kHz