22 Settembre 2005 - CSN1 P. Capiluppi CMS Computing TDR (e non solo…) Cosa descrive il C-TDR (e...
-
Upload
magdalen-doris-melton -
Category
Documents
-
view
221 -
download
0
Transcript of 22 Settembre 2005 - CSN1 P. Capiluppi CMS Computing TDR (e non solo…) Cosa descrive il C-TDR (e...
22 Settembre 2005 - CSN1P. Capiluppi
CMS Computing TDR(e non solo…)
Cosa descrive il C-TDR (e cosa no)Computing Model rivisitato
Cosa e’ cambiato
I “servizi” e l’operazione del Computing
Qualche risultato
This document is primarily concerned with the preparations for the first full year of LHC running, expected to be 2008.
2P. Capiluppi 22 Settembre 2005 - CSN1
C-TDR: cos’e’ e cosa non e’C-TDR: cos’e’ e cosa non e’Descrizione di una possibile implementazione del Computing (la parte “C” di CMS-CPT)
Valutazioni per il primo “full year” di LHC (nominale 2008) Con il necessario ramp-up prima e dopo
Il “Software” (in dettaglio…) sara’ nel Physics TDR vol. 1 (~fine 2005)La parte “P” sara’ nel Physics TDR vol. 1 e vol. 2 (~Aprile 2006)
Non e’ un “blueprint” del Computing System Ma una descrizione di cosa puo’ essere “baseline” e come procedere
oltre questaVa di “pari passo” con il TDR di LCG (WLCG nel documento)
Servizi, Tiers, Organizzazione, Etc.
The CMS computing environment is a distributed system of computing services and resources that interact with each other as Grid services. The set of services and their behavior together provide the CMS computing system as part of the Worldwide LHC Computing Grid.
Together they comprise the computing, storage and connectivity resources that CMS uses to do data processing, data archiving, event generation, and all kinds of computing related activities.
3P. Capiluppi 22 Settembre 2005 - CSN1
Computing Model: variazioniComputing Model: variazioni CERN CAF (CERN Analysis Facility)
Al posto del Tier1 e del Tier2 al CERN
Integrazione con Grid (few flavours) con interfaccia verso le applicazioni “comune” Coinvolgimento nelle specifiche Adattamento del “software-tools” di CMS Infrastruttura distribuita (Computing System)
basata su (W)LCG
Incremento delle necessita’ di nastro ai Tier1 Definizione dei Tier3 WAN bandwidth less of a factor 2 at Tiers
Efficienza di uso tolta per consitenza con gli altri Exps
Identificazione dei Tier1 e dei Tier2 insieme a LCG
4P. Capiluppi 22 Settembre 2005 - CSN1
Tiered ArchitectureTiered Architecture
Not all connections are shown - for example flow of MC data from Tier-2’s to Tier-1’s or peer-to-peer connections between Tier-1’s.
5P. Capiluppi 22 Settembre 2005 - CSN1
Tier-3 CentersTier-3 Centers Functionality
User interface to the computing system Final-stage interactive analysis, code development, testing Opportunistic Monte Carlo generation
Responsibility Most institutes; desktop machines up to group cluster
Use by CMS Not part of the baseline computing system
Uses distributed computing services, does not often provide them
Not subject to formal agreements
Resources Not specified; very wide range, though usually small
Desktop machines -> University-wide batch system But: integrated worldwide, can provide significant
resources to CMS on best-effort basis
6P. Capiluppi 22 Settembre 2005 - CSN1
CMS-CAFCMS-CAF Functionality
CERN Analysis Facility: development of the CERN Tier-1 / Tier-2
Integrates services associated with Tier-1/2 centers Primary: provide latency-critical services not possible
elsewhere Detector studies required for efficient operation (e.g. trigger) Prompt calibration ; ‘hot’ channels
Secondary: provide additional analysis capability at CERN Responsibility
CERN IT Division Use by CMS
The CMS-CAF is open to all CMS users (As are Tier-1 centers) But: the use of the CAF is primarily for urgent (mission-
critical) tasks Resources
Approx. 1 ‘nominal’ Tier-1 (less MSS due to Tier-0)+ 2 ‘nominal’ Tier-2
CPU 4.8MSI2K; Disk 1.5PB; MSS 1.9PB; WAN >10Gb/s NB: CAF cannot arbitrarily access all RAW&RECO data during running
Though in principle can access ‘any single event’ rapidly.
7P. Capiluppi 22 Settembre 2005 - CSN1
Project OrganizationProject Organization
8P. Capiluppi 22 Settembre 2005 - CSN1
Services OverviewServices Overview
9P. Capiluppi 22 Settembre 2005 - CSN1
Basic Distributed WorkflowBasic Distributed Workflow
The CTDR has served to converge on a basic architectural blueprint for a baseline system.
We are now beginning the detailed technical design of the components
It should be possible to bring up such a system over the next 6 9 months for the cosmic challenge and then CSA 2006
10P. Capiluppi 22 Settembre 2005 - CSN1
Data ManagementData Management Data organization
‘Event collection’: the smallest unit larger than one event Events clearly reside in files, but CMS DM will track collections
of files (aka blocks) (Though physicists can work with individual files)
‘Dataset’: a group of event collections that ‘belong together’
Defined centrally or by users
Data management services Data book-keeping system (DBS) : “what data exist?”
NB: Can have global or local scope (e.g. on your laptop) Contains references to parameter, lumi, data quality info.
Data location service (DLS) : “where are the data located?” Data placement system (PhEDEx)
Making use of underlying Baseline Service transfer systems Site local services:
Local file catalogues Data storage systems
11P. Capiluppi 22 Settembre 2005 - CSN1
Workload ManagementWorkload Management Running jobs on CPUs… Rely on Grid workload management, which must
Allow submission at a reasonable rate: O(1000) jobs in a few sec
Be reliable: 24/7, > 95% job success rate Understand job inter-dependencies (DAG handling) Respect priorities between CMS sub-groups
Priority changes implemented within a day Allow monitoring of job submission, progress Provide properly configured environment for CMS jobs
Beyond the baseline Introduce ‘hierarchical task queue’ concept CMS ‘agent’ job occupies a resource, then determines its
task I.e. the work is ‘pulled’, rather than ‘pushed’.
Allows rapid implementation of priorities, diagnosis of problems
12P. Capiluppi 22 Settembre 2005 - CSN1
Integration ProgramIntegration Program
This Activity is a recognition that the program of work for Testing, Deploying, and Integrating components has different priorities than either the development of components or the operations of computing systems. The Technical Program is responsible for implementing
new functionality, design choices, technology choices, etc. Operations is responsible for running a stable system that
meets the needs of the experiment Production is the most visible operations task, but analysis
and data serving is growing. Event reconstruction will follow
Integration Program is responsible for installing components in evaluation environments, integrating individual components to function as a system, performing evaluations at scale and documenting results.
The Integration Activity is not a new set of people nor is it independent of either the Technical Program or the Operations Program
Integration will rely on a lot of existing effort
13P. Capiluppi 22 Settembre 2005 - CSN1
Project PhasesProject Phases
Computing support for Physics TDR, -> Spring ‘06 Core software framework, large scale production & analysis
Cosmic Challenge (Autumn ‘05 -> Spring ‘06) First test of data-taking workflows Data management, non-event data handling
Service Challenges (2005 - 06) Exercise computing services together with WLCG + centres System scale: 50% of single experiment’s needs in 2007
Computing, Software, Analysis (CSA) Challenge (2006) Ensure readiness of software + computing systems for data 10M’s of events through the entire system (incl. T2)
Commissioning of computing system (2006 - 2009) Steady ramp up of computing system to full-lumi running.
2005 2006 2007 2008 2009
P-TDR
SC3-4
Cosmic
CSA-06
Commissioning
14P. Capiluppi 22 Settembre 2005 - CSN1
CPT organization: TodayCPT organization: Today
EvF/DQME. Meschi
ReconstructionT. Boccali
Analysis ToolsL. Lista
SimulationM. Stavrianakou
Calibr/alignmentO. Buchmuller
L. Lueking
FrameworkL. Sexton
SoftwareL.Silvestris
A.Yagil
Integration Program
S. Belforte/I. Fisk
Facilities and Infrastructure
N. Sinanis
Operations ProgramL. Barone
Technical Program
P.Elmer/S.Lacaprara
ComputingL. Bauerdick D.Stickland
Project ManagerP.Sphicas
Project OfficeV.Innocente
L.Taylor
Fast SimulationP. Janot
ECAL/e-C. Seez
Y. Sirois
TRACKER/b-I.TomalinF. Palla
HCAL/JetMETJ.RohlfC.Tully
MuonsN. Neumeister
U.Gasparini
Detector-PRSD.AcostaP.Sphicas
Online SelectionS. Dasu
C. Leonidopoulos
HiggsS. Nikitenko
SUSY & BSML. Pape
M. Spiropulu
Standard ModelJ. Mnich
Heavy IonsB. Wyslouch
Analysis-PRSA.DeRoeckP.Sphicas
Generator ToolsF. Moortgat
S. Slabospitsky
ORCA for PTDRS. Wynhoff
Software Devel. Tools
S. Argiro’
15P. Capiluppi 22 Settembre 2005 - CSN1
Analisi: CMS-CRAB MonitorAnalisi: CMS-CRAB Monitor
Via LCG-GridSolo parte dei jobs di CMS
Per es. Simulazione, trasferimenti, validazione, etc.
L.Faina, L.Servoli, D.Spiga, H.Riahi (PG)
Submitted Submitted jobsjobs
Submitted Submitted fromfrom
Destination Destination of jobsof jobs
16P. Capiluppi 22 Settembre 2005 - CSN1
Jobs di analisi (CMS-CRAB via Grid)
Jobs di analisi (CMS-CRAB via Grid)~100000 CRAB-jobs / ~3 months
10Jul
17Jul
31Jul
24Jul
14Aug
07Aug
28Aug
21Aug
04Sep
11Sep
week
# of jobs
July August September
18Sep
By Daniele Spiga
17P. Capiluppi 22 Settembre 2005 - CSN1
# o
f su
bm
issi
on
s#
of
sub
mis
sio
ns
# of jobs per task# of jobs per task
CMS-CRAB Monitor: analisi dei datiCMS-CRAB Monitor: analisi dei dati
By Daniele Spiga
18P. Capiluppi 22 Settembre 2005 - CSN1
CMS-CRAB Monitor: analisi dei datiCMS-CRAB Monitor: analisi dei dati
By Daniele Spiga
19 New Frontiers in Subnuclear Physics - Sept 12th-17th, 2005 – Milano, Italy D. Bonacorsi
LCG Service Challenge 3 (‘SC3’) LCG Service Challenge 3 (‘SC3’)
Data transfer and data serving in real use-cases Two phases:
1. Jul05: SC3 “throughput” phase Tiers simultaneous import/export, MSS involved: move real files, store on
real hw
2. >Sep05: SC3 “service” phase small scale replica of the overall system
• modest throughput, main focus on testing a quite complete environment
20P. Capiluppi 22 Settembre 2005 - CSN1
SC3 ratesSC3 rates
21P. Capiluppi 22 Settembre 2005 - CSN1
C-TDR resources and INFNC-TDR resources and INFN
pledged = Chris Eck tables Sept05
“should be” = CMS C-TDR Italian expected contribution
2008%CMS share: Tier1-CNAF 2006 2007 2008 2009 2010 of all CMS
CPU pledged (kSI2K) 630 840 1930 2800 4030 13%CPU "should be" 760 1900 3790 5190 10180 25%
Disk pledged (TB) 300 420 880 1400 2030 13%Disk "should be" 210 520 1750 2620 3930 25%
Tape pledged (TB) 300 350 740 1440 2100 4%Tape "should be" 380 960 4160 7370 10570 25%
2008%CMS share: Tier2s 2006 2007 2008 2009 2010 of all CMS
CPU pledged (kSI2K) 530 880 1750 2800 3850 9%CPU "should be" 680 1710 3430 5740 9170 18%
Disk pledged (TB) 110 210 530 880 1230 11%Disk "should be" 100 260 870 1740 2620 18%
Tier1 for CMSTier1 for CMS
CMS Tier2sCMS Tier2s
22P. Capiluppi 22 Settembre 2005 - CSN1
Tier1-CNAF (CMS only)Tier1-CNAF (CMS only)CMS Tier1-CNAF CPU
0
2000
4000
6000
8000
10000
12000
2006 2007 2008 2009 2010
years
kSI2
K
CPU Pledged LCGSept05
CPU "should be"
CMS Tier1-CNAF Disk
0500
10001500200025003000350040004500
2006 2007 2008 2009 2010
yearsT
B
Disk Pledged LCGSept05
Disk "should be"
CMS Tier1-CNAF Tape
0
2000
4000
6000
8000
10000
12000
2006 2007 2008 2009 2010
years
TB
Tape Pledged LCGSept05
Tape "should be"
CPUCPU
DischiDischi
NastriNastri
23P. Capiluppi 22 Settembre 2005 - CSN1
Tier2s CMS ItalyTier2s CMS ItalyCMS Tier2s CPU
0
2000
4000
6000
8000
10000
2006 2007 2008 2009 2010
years
kSI2
K
CPU Pledged LCGSept05
CPU "should be"
CPU requests INFN
CMS Tier2s Disk
0
500
1000
1500
2000
2500
3000
2006 2007 2008 2009 2010
years
TB
Disk Pledged LCGSept05
Disk "should be"
Disk requests INFN
CPUCPU
DischiDischi
24P. Capiluppi 22 Settembre 2005 - CSN1
Richieste 2006 CMS CalcoloRichieste 2006 CMS Calcolo
Tier Total ComDescrizione kEuro Persone EXTRA kEuro kEuro kSI2K TB
Bari Tier2 30 kSI2K - 10 TB 40.0 L.S. 4 m.u. 20.0 60.0 30 10Bologna Tier3 nulla 0.0 P.C., C.G. 4 m.u. 18.0 18.0 0 0Catania Tier3 2 dual 5.0 5.0 10 0Firenze Tier3 3 TB 9.0 9.0 0 3Legnaro Tier2 60 kSI2K - 15 TB 68.0 M.B. 1 m.u. 5.0 73.0 60 15Milano Tier3 2 dual - 2 TB 11.0 11.0 10 2Napoli Tier3 nulla 0.0 L.L. 2 m.u. 10.0 10.0 0 0Padova Tier3 nulla 0.0 U.G.,S.L. 6 m.u. 30.0 30.0 0 0Pavia Tier3 nulla 0.0 0.0 0 0Perugia Tier3 6 dual - 2 TB 21.0 21.0 30 2Pisa Tier2 50 kSI2K - 15 TB 62.0 T.B.,F.P.,G.B. 7 m.u. 35.0 97.0 50 15Roma1 Tier2 30 kSI2K - 15 TB - switch 60.0 L.B. 2 m.u. 10.0 70.0 30 15Torino Tier3 5 dual - 2 TB 18.5 18.5 25 2Trieste Tier3 2 dual - 1TB+box 11.0 S.B. 10 m.u. 45.0 56.0 10 1
Total 305.5 173.0 478.5 255 65Tier1 CNAF Tier1 125 kSI2K - 100 TB 166.0 166.0 125 100
Tier2s 230.0 70.0 300.0 170 55Tier3s 75.5 103.0 178.5 85 10
2006 richiestoRichieste 2006Inventariabile Missioni E.
25P. Capiluppi 22 Settembre 2005 - CSN1
Milestones 2005: specifiche Milestones 2005: specifiche CMS Computing TDR (e TDR-LCG) [Luglio 2005]
Definizione delle risorse del Tier1 e dei Tier2 che partecipano al Computing Model [Febbraio 2005]
Definizione dei servizi (dati, software, LCG) disponibili in Italia [Febbraio 2005]
Definizione della partecipazione di CMS Italia a LCG e INFN Grid [Maggio 2005]
Produzione ed analisi dei dati per il P-TDR [Gennaio 2006]Gruppi di fisica di responsabilità e interesse italiano: b-tau, muon, e-gamma, Higgs, SYSY, SM
Partecipano il Tier1 e almeno metà dei Tier2/3 [Luglio 2005] Produzione di ~2 M eventi/mese [Gennaio 2005] Analisi preliminare di almeno 4 canali (es. H->WW->2mu2nu)
[Febbraio 2005] Partecipano il Tier1, tutti i Tier2 e piu’ della metà dei Tier3 [Ottobre
2005] Produzione di ~ 20 M eventi [Dicembre 2005] Analisi di almeno 8 canali per il P-TDR [Dicembre 2005]
Deployment di un prototipo di sistema di analisi distribuito su LCG [Aprile 2005]
Definizione delle componenti [Gennaio 2005] Definizione e implementazione dell’infrastruttura organizzativa
italiana [Febbraio 2005] Data Challenge 05 completato (~20% INFN) [Dicembre 2005]
===========================================
DoneDone
On progress :
On progress :
~50%~50%
DoneDone
26P. Capiluppi 22 Settembre 2005 - CSN1
Milestones 2006Milestones 2006
June 2006. LCG Service Challenge 4 (SC4) start, being the software and computing support for the Cosmic Challenge ready:includes Tier1-CNAF+ at least 1/2 of CMS-Italy Tier2s.
October-November 2006. Computing, Software and Analysis Challenge (CSA-2006): includes Tier1-CNAF, all CMS-Italy Tier2s and some CMS-Italy Tier3s.
December 2006. Integration of Computing systems atTier1s and Tier2s ready for testing: includes all Italian Tiers for CMS.
27P. Capiluppi 22 Settembre 2005 - CSN1
Additional slidesAdditional slides
28P. Capiluppi 22 Settembre 2005 - CSN1
Occupazione farm (Apr-Sept 05)Occupazione farm (Apr-Sept 05)
CPU time totale wallclock time totale
http://tier1.cnaf.infn.it/monitor/LSF/plots/acct/
29P. Capiluppi 22 Settembre 2005 - CSN1
KspecInt2k (Maggio-Agosto)KspecInt2k (Maggio-Agosto)
Esperimento Job completati CPU time Totale timeKSpecINT2k usati (media
settimanale)
babar 100217 109916457,00 169361999,00 26 (200)
atlas 39825 28170951,00 192104861,00 27 (113)
lhcb 277409 867997601,00 1057231777,00 180 (163)
cdf 227406 668050691,00 1219094409,00 213 (75)
alice 9073 20812982,00 35077905,00 6 (209)
cms 101934 125029660,00 417938735,00 67 (280)
argo 13877 10041606,00 11703758,00 2 (10)
ams 5055 64060316,00 72616664,00 12 (32)
virgo 358 1486397,00 1496847,00 0 (24)
pamela 69 0,24 0,52 0
quarto 139 1995717,00 1998859,00 1
geant4 3 0.000 0.003
magic 186 1829721,00 1835301,00 0 (10)
biomed 1435 21693331,00 24377695,00 4
30P. Capiluppi 22 Settembre 2005 - CSN1
Accounting storageAccounting storage
Exp Total capacity
(TB) Total Used
(TB) Castor disk
capacity (TB) Castor disk used
(TB) Pure disk capacity
(TB) Pure disk used
(TB)
Tape Capacity
(TB)
Tape used (TB)
ALL 272.7 (250) 190.1 48.3 33.5 194.3 136.6
ALICE 9.0 (30) 5.6 7.4 5.1 1.6 0.5 11.3 7.3
ATLAS 9.4 (25) 4.8 9.4 4.8 0.0 0.0 21.2 14.9
CMS 56.2 (68) 37.2 1.7 1.3 54.5 35.9 9.6 3.6
LHCb 23.1 (15) 20.7 17.4 16.4 5.7 4.3 25.5 20.5
AMS 2.7 (2.6) 2.5 2.7 2.5 0.0 0.0 8.6 5.2
ARGO 3.8 (7) 1.6 1.9 0.1 1.9 1.5
BABAR 74.9 (38) 61.4 0.0 0.0 64.9 51.4 *
CDF 42.2 (32) 33.1 0.0 0.0 32.2 23.1 *
MAGIC 1.1 (1) 0.6 0.0 0.0 1.1 0.6 *
VIRGO 37.2 (34) 22.5 6.3 3.2 30.9 19.2 *
PAMELA 1.6 0.0 1.6 0.0 0.0 0.0 *
GEANT4 0.0 0.0 0.0 0.0 0.0 0.0
TEORICI 1.6 0.0 0.0 0.0 1.6 0.0
NOTE• per BABAR inclusi anche ~32 TB e per CDF ~ 10 TB • 200 raw TB da fine Settembre, phase-out di hw vecchio (~ 20 TB) • * = accesso a CASTOR configurato ma non ancora usato
31 New Frontiers in Subnuclear Physics - Sept 12th-17th, 2005 – Milano, Italy D. Bonacorsi
RUNNING
PENDING
Total nb. jobs
Max nb. slots
CMS share w.r.t total farm occupancyCMS share w.r.t total farm occupancy
monthmonth Jun05 Jul05
type of jobstype of jobs all jobs Grid jobs all jobs Grid jobs
Jobs done [%]Jobs done [%] 24.4 43.6 14.8 27.6
CPU time [%]CPU time [%] 5.1 8.3 5.0 7.9
Total time [%]Total time [%] 25.6 33.1 18.8 24.3
T1 powerfailure
T1 switchproblem
Upgrade to LCG260(ramp-down, upgrade,
LCG certification…)
May be capable to addressthe needs of experiments
Focused effort needed on: reliability and 24/7 support scheduled interventions troubleshooting
[ all jobs at the T1 ]
T1 farm occupancy
Underwent some migration seasons: OS: RH SLC v.3.0.4
mw: LCG v.2.6.0 ~90% WN migrated, running LCG cert
install WNs/servers: Quattor
batch scheduler: LSF v.6.1
+ LCG interf.
32P. Capiluppi 22 Settembre 2005 - CSN1
Technical ProgramTechnical Program
job configuration and scheduling services, including policies, prioritisation of workloads, and ensuring scalability of job scheduling;dataset placement and data transfer services,dataset bookkeeping services, including physics metadata services;instrumentation services and user interfaces, including a CMS dashboard, monitoring, job tracking;CMS storage dataset access services, e.g., SRM, Castor, POOL/PubDB++/local file catalogs;CMS workflows support, for the data taking and processing, MC production, and calibration workflows;CMS VO services.
33P. Capiluppi 22 Settembre 2005 - CSN1
Integration ProgramIntegration Program
developing and maintaining the CMS Computing Integration Plan;preparing for and running of series of data challenges, and service challenges;taking responsibility for validation, releases, and deployment;providing workflow integration for production, dataset publishing, distributed analysis, data taking and processing;providing component integration into a coherent and functional computing environment;releasing and delivering the integrated computing environment of CMS computing services and components into the CMS production Grid environment, working with the WLCG;liaising on a very practical level with CERN-IT,the CMS regional centres, the LCG and Grid projects: EGEE, OSG, and NorduGrid.
34P. Capiluppi 22 Settembre 2005 - CSN1
Operation ProgramOperation Program
developing and maintaining the CMS Computing operations model and plan, working with the Computing Management Team,MC production operations,database system operations,calibration workflow support,data-taking operations and data validation; anduser support.
35P. Capiluppi 22 Settembre 2005 - CSN1
Facilities ProgramFacilities Program
This covers services that are a shared need and responsibility of the full CMS Collaboration, such as the support for the CMS common computing and software environment, software process services, and core support for production operations.In preparation for data taking, CMS will need to keep close liaison and coordination with its Tier-1 centres that provide the bulk of computing resources required for data processing and data analysis.Together, these form the CMS Computing Facilities program, thus consisting ofCMS Tier-0 coordinatorCMS CAF coordinatorCMS common services and infrastructure coordinatorCMS Tier-1 technical contacts
36P. Capiluppi 22 Settembre 2005 - CSN1
CPT L1 and Computing L2 Milestones V34.2CPT L1 and Computing L2 Milestones V34.2
37P. Capiluppi 22 Settembre 2005 - CSN1
2007 2008 2009 2010Pilot 2E33+HI 2E33+HI E34+HI
Tier-0 CPU 2.3 4.6 6.9 11.5 MSi2kDisk 0.1 0.4 0.4 0.6 PBTape 1.1 4.9 9 12 PBWAN 3 5 8 12 Gb/s
A Tier-1 CPU 1.3 2.5 3.5 6.8 MSi2kDisk 0.3 1.2 1.7 2.6 PBTape 0.6 2.8 4.9 7.0 PBWAN 3.6 7.2 10.7 16.1 Gb/s
Sum Tier-1 CPU 7.6 15.2 20.7 40.7 MSi2kDisk 2.1 7.0 10.5 15.7 PBTape 3.8 16.7 29.5 42.3 PB
A Tier-2 CPU 0.4 0.9 1.4 2.3 MSi2kDisk 0.1 0.2 0.4 0.7 PBWAN 0.3 0.6 0.8 1.3 Gb/s
Sum Tier-2 CPU 9.6 19.3 32.3 51.6 MSi2kDisk 1.5 4.9 9.8 14.7 PB
CMS CERN Analysis Facility (CMS-CAF)CPU 2.4 4.8 7.3 12.9 MSi2kDisk 0.5 1.5 2.5 3.7 PBTape 0.4 1.9 3.3 4.8 PBWAN 0.3 5.7 8.5 12.7 Gb/s
Total CPU 21.9 43.8 67.2 116.6 MSi2kDisk 4.1 13.8 23.2 34.7 PBTape 5.4 23.4 41.5 59.5 PB
Running Year
Average Numbers: 6 Tier1s, 22.5 Tier2s
38P. Capiluppi 22 Settembre 2005 - CSN1
Data TiersData Tiers RAW
Detector data + L1, HLT results after online formatting Includes factors for poor understanding of detector,
compression, etc 1.5MB/evt @ <200Hz; ~ 5.0PB/year (two copies)
RECO Reconstructed objects with their associated hits 250kB/evt; ~2.1PB/year (incl. 3 reproc versions)
AOD The main analysis format; objects + minimal hit info 50kB/evt; ~2.6PB/year - whole copy at each Tier-1
TAG High level physics objects, run info (event directory);
<10kB/evt
Plus MC in ~ 1:1 ratio with data
39P. Capiluppi 22 Settembre 2005 - CSN1
Data FlowData Flow Prioritization will be important
In 2007/8, computing system efficiency may not be 100% Cope with potential reconstruction backlogs without
delaying critical data Reserve possibility of ‘prompt calibration’ using low-
latency data Also important after first reco, and throughout system
E.g. for data distribution, ‘prompt’ analysis
Streaming Classifying events early allows prioritization Crudest example: ‘express stream’ of hot / calib. events Propose O(50) ‘primary datasets’, O(10) ‘online streams’ Primary datasets are immutable, but
Can have overlap (assume ~ 10%) Analysis can draw upon subsets and supersets of primary
datasets
40P. Capiluppi 22 Settembre 2005 - CSN1
Tier-0 CenterTier-0 Center Functionality
Prompt first-pass reconstruction NB: Not all HI reco can take place at Tier-0
Secure storage of RAW&RECO, distribution of second copy to Tier-1
Responsibility CERN IT Division provides guaranteed service to CMS
Cast iron 24/7
Covered by formal Service Level Agreement
Use by CMS Purely scheduled reconstruction use; no ‘user’ access
Resources CPU 4.6MSI2K; Disk 0.4PB; MSS 4.9PB; WAN 5Gb/s
41P. Capiluppi 22 Settembre 2005 - CSN1
Tier-1 CentersTier-1 Centers Functionality
Secure storage of RAW&RECO, and subsequently produced data Later-pass reconstruction, AOD extraction, skimming, analysis
Require rapid, scheduled, access to large data volumes or RAW Support and data serving / storage for Tier-2
Responsibility Large CMS institutes / national labs
Firm sites: ASCC, CCIN2P3, FNAL, GridKA, INFN-CNAF, PIC, RAL Tier-1 commitments covered by WLCG MoU
Use by CMS Access possible by all CMS users (via standard WLCG services)
Subject to policies, priorities, common sense, … ‘Local’ use possible (co-located Tier-2), but no interference
Resources Require six ‘nominal’ Tier-1 centers; will likely have more physical
sites CPU 2.5MSI2K; Disk 1.2PB; MSS 2.8PB; WAN >10Gb/s
42P. Capiluppi 22 Settembre 2005 - CSN1
Tier-2 CentersTier-2 Centers Functionality
The ‘visible face’ of the system; most users do analysis here Monte Carlo generation ‘Specialized CPU-intensive tasks, possibly requiring RAW data
Responsibility Typically, CMS institutes; Tier-2 can be run with moderate effort We expect (and encourage) federated / distributed Tier-2’s
Use by CMS ‘Local community’ use: some fraction free for private use ‘CMS controlled’ use: e.g., host analysis group with ‘common
resources’ Agreed with ‘owners’, and with ‘buy in’ and interest from local community
‘Opportunistic’ use: soaking up of spare capacity by any CMS user
Resources CMS requires ~25 ‘nominal’ Tier-2; likely to be more physical sites CPU 0.9MSI2K; Disk 200TB; No MSS; WAN > 1Gb/s Some Tier-2 will have specialized functionality / greater network cap
43P. Capiluppi 22 Settembre 2005 - CSN1
Resource EvolutionResource Evolution
CPU
0
15
30
45
60
M-S
I2k
Tier-0 CMS-CAF Tier-1's total Tier-2's total
DISK
0
5
10
15
20
Pet
aByt
e
TAPE
0
15
30
45
60
2007 2008 2009 2010
Pet
aByt
e
44P. Capiluppi 22 Settembre 2005 - CSN1