Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007:...

23
Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università, Politecnico e INFN Bari N. De Filippis M. Abbrescia, G. Cuscela, G. Donvito, G. Maggi, S. My, A. Pierro, A. Pompili, + contribution of developers (Kavka, Fanfani, Codispoti, Bacchi)

Transcript of Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007:...

Page 1: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1

Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano

Università, Politecnico e INFN Bari

N. De FilippisM. Abbrescia, G. Cuscela, G. Donvito,

G. Maggi, S. My, A. Pierro, A. Pompili,

+ contribution of developers (Kavka, Fanfani, Codispoti, Bacchi)

Page 2: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 2

Outline

Status of CMS Monte Carlo production:

organization and current requests

Monte Carlo production in Italy:

Activity post –CSA06

Problems with sites

Efficiency of italian sites

Reliability of sites

CMS plans and milestones for 2007

Page 3: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 3

Goal of MC production: to produce events for CMSSW validation (simulation/reconstruction) and physics studies

Small RelVal samples upon a new CMSSW release PhysVal / HLT groups make requests in form of cfg´s Experts provide ProdAgent Workflows Assignment to Production Teams posted on twiki:

https://twiki.cern.ch/twiki/bin/view/CMS/ProdOps Currently 6 teams: LCG(1,2,3,5,6) and OSG Each team has O(10) dedicated T1/T2 sites’ When done, files merged and injected to PhEDEx Too many manual steps and too many extra-prod. duties

(e.g. monitoring/dealing with sites availability & stability) A lot of pressure from SDV group ( P. Janot) to produce events ASAP

MC production cycle

Page 4: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 4

After CSA06: CMSSW_1_1_1 and 1_1_2 used until Xmas CMSSW_1_2_0 released mid-Dec06 Production with CMSSW_1_2_0 running continously since

Dec06 PhysVal requests (10M w/o PU + 16.5M w PU) HLT requests (100M w/o PU+ 20M w PU x 2) HLT + PU in 2 steps GEN-SIM / DIGI-RECO

about 20M done, many running, but very tight schedule!some samples:–QCD di-jets (0 < pt-bin< 3.5TeV), w & w/o PU–Excl. W & Z decays, Wjets(0 < pt < 1TeV) w & w/o PU–Inclusive ttbar, … see https://twiki.cern.ch/twiki/bin/view/CMS/ProdOps120

Current official requests

P. Kreuzer

Page 5: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 5

PhysVal samples with CMSSW_1_2_0

LC

G (

3)

LC

G (

3)

Page 6: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 6

HLT samples with CMSSW_1_2_0

LC

G (

3)

After120 bulk production over, a few «special» requests will be addressed:– Muon Enriched sample with 121: few hundredK events– Cosmics for Tracker with122: 2.5 -5M events

Page 7: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 7

On going effort of the OSG, LCG1,2,5,6

Conclusions of P. Kreuzer: with2 new and efficient production teams on board, remaining120 assignments should be delivered(at least partially) within 10 days.

Page 8: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 8

MC production in Italy

Page 9: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 9

Post-CSA06 activity (1)

Official CSA06 note complete

Internal CMS note on CSA06 in italian tiers complete

CSA06 analyses completed

Page 10: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 10

Post-CSA06 activity (2)

Since October 2006 until today the LCG(3) team:

re-started the Monte Carlo production withous stops also during the Xmas break

has increased the number of esperts to run ProdAgent

has exported the monitoring tool developed at Bari also at the other LCG teams

has produced about 15 M events for the studies of Physics validation and HLT with and without PU…..1/3 of the entire production in CMS

has used the European LCG resources with continuity, giving enormous feedback for the problem resolution of remote sites

Page 11: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 11

Sites used by the LCG(3) team

CERN used intensively before and after XmasIt

alia

n si

tes

Eng

lish

sit

es

Hungary

Taiwan

IN2P3

Page 12: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 12

On going effort of LCG (3)

On going GEN-SIM and DIGI-RECO with low luminosity Pileup

Page 13: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 13

Issues about ProdAgent

Production setup at Bari: 3 instances of PA running at Bari:

two for FEVT and GEN-SIM production one for DIGI-RECO production with PU

one machine for on-line dump of the DBs

Monitoring tool exported to other LCG teams with positive feedback. The submission of jobs is somehow slow (up to 2-3 job/min) due to:

performances of the PA machines which are two years old overhead of the RBs no bulk submission

The control of jobs that failed or aborted because of the middleware

problems is difficult. Killing jobs of a given production or submitted to a given

site was problematic PA developers provided a script to do this. LCG(3) will smoothly leave English CEs to LCG (6) (the english team) and

IN2P3 to LCG(5) (the belgian team) w.r.t debugging & intensive use.

On the long run: BulkSubmission& Resource Monitor

Page 14: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 14

Most of LCG(3) sites had various problems before and during the Xmas break November: Bari, Pisa, Roma when restarting production, CNAF: problems with castor

English sites and IN2P3 had alternate periods of activity also during last month.

Italian sites were really efficient during last month. Debugging of sites is tipically really painful and requires continous interaction with

the site administrators. Problems:

stage out was the main cause of job failures. site validation: storage, software tag, software mount points, local copy of PU grid problems: instabilities of the CE because of high load, overload of RBs which caused:

RB didn´t change status of jobs («Waiting» status forever) No chance to monitor: FWJobreport and log files lost Difficult/tedious for prod. teams to kill jobs via BOSS commands

The debugging of sites is not a task to be covered by production teams. CMS is reacting and preparing centralized tests to ensure the reliability of sites.

Problems with sites

Page 15: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 15

Efficiency of the italian sites (last month): CNAF

No PU

CE replaced

Except for few days CNAF worked very well to ensure

high efficiency of the CMS production during last month

Page 16: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 16

CPU hours and the percentage % of Tier-1 resources used by CMS: Month-week | CPU hr | % --------------------------------------- 15 jan 21 jan : 33.4% 22 jan 28 jan : 19.0% 29 jan 4 feb : 24.8% 5 feb 11 feb : 22.4%

Statistics of use of CNAF(last month)

The percentage of use depends on the fairshare setup at CNAF

Successful jobs

Queues always full of jobs, CMS at maximum of use at CNAF.

Page 17: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 17

Efficiency of the italian sites (last month): INFN

Except for limited problems with the storage at Bari, Pisa and Rome

all the Italian tier-2 like sites worked very well during last month.

Page 18: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 18

Statistics from dashboard

Page 19: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 19

Reliability of sites: tests

1) Submit a small processing job for each advertised CMSSW release at a site. This job checks: Job can be submitted to site Local stage out can be done report can be made back via grid middleware 10 event Minimum Bias? test frontier access as well?

2) Following completion of the test job, submit a read back job: verifies job submission checks data access clean up file to test cleanup procedure

3) Check global DBS datasets at site: check read access to all fileblocks at site report back bad files and invalidate in DBS perhaps randomly select a dataset to test every day/week etc.

Following the feedback of problems found by production operators CMS is defining centralized tests to be run every given time to certify sites for production and analysis. The ideas are:

Page 20: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 20

Reliability of sites: SAM tests

SAM (Service Availibility Monitoring)

Hopefully the human resources needed for MC production are expected to decrease so less production teams submitting jobs to any sites

Page 21: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 21

Plans for MC production in 2007

Page 22: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 22

Finalize 120 Production (aim for mid-Feb!) Expecting small 12x requests (RelVal, Muon-enrichedHLT, …) 130 Release (all HLT components) end Feb07 130 HLT Production in Mar07 In parallel, Alpgen Integration in Production

Timescale: integrate till Mar07 + test samples, PH prod. Apr-May07 140 Release (new geo) end Mar07 140 Physics production Apr-May07 (30M / month) 150 Release mid-May07 with improved reco algorithms(re-RECO) Launch CSA07 with16x end-July07

To be defined the contribution of Italy to the previous activities and the manpower. In addition the CSA07 during summer could be a real problem.

2007 milestones

Page 23: Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 23

Conclusions Monte Carlo production of LCG(3) team run continuosly since the end of CSA06 until now

About 15M of events produced (1/3 of the overall CMS productio)

Italian sites are working very well during last month to unsure high efficiency production.

Warning: keep high the attention to Italian Tiers, mainly at CNAF

Effective interaction between operators and developers of PA

The load of production operators should decrease as soon as (possible) the centralized SAM tests will run to certify sites for production.

The Italian contribution to the activities in preparation and for CSA07 has to be discussed.