Introduzione alle metodologie e agli strumenti per il...
-
Upload
truongnguyet -
Category
Documents
-
view
218 -
download
0
Transcript of Introduzione alle metodologie e agli strumenti per il...
12009/05/04
Introduzione alle metodologie e agli strumenti per il Grid
Corso integrativo all'insegnamento di
Ingegneria del Software 2
Contatti: [email protected] (353-2789, BioLab, Villa Bonino -1)
Modalità d'esame: N punti su 30 con N=0..4
Disponibilità slide on-line: Si, più avanti
22009/05/04
Disclosure agreement
• Grid != High Performance Computing
• Grid == A large ICT infrastructure supporting large research infrastructures (LHC)
• Grid ~= Peer2Peer computing (SETI@home)
• “I have too many servers” → Utility computing
• Utility Computing + marketing ~= Cloud computing
• Grid + business models + virtualization ~= Cloud Comp.
• Summary:• Research + Physics → Grid
• Business + Marketing + Grid + (1 - Physics)→ Cloud
• In this course: Grid = rnd(Cloud, Grid, … , … , ...);
42009/05/04
Outline
• Introduction• Service Utilities• The Grid computing model
• EGEE and other EU Projects• Cloud computing
52009/05/04
Introduction
• Architectures evolution• Context size: computing, storage, network• Use cases
62009/05/04
IT Architecture evolution
• Personal Computing• Client/Server Architecture• Multi-Tier Architecture
• Server Farm• Cluster• Multi-Cluster
• High Performance Computing • Grid Computing• Peer To Peer Computing
# CPU
1-2
2-4
3-10
5-50
5-100
100-1000
thousands
>40000
millions
72009/05/04
IT dimensions
• Computing: flops/specints/mips …• Storage: bytes• Network: bit/sec
1. Computing
2. S
tora
ge
3. Netw
ork
Parte I - Introduzione
82009/05/04
IT Actors: “Plug & Use” models
• Grid Computing introduce a new use model for computing• Storage models already active (hosting/housing, storage leasing)• Network models are relevant (Internet Providers, TLC Carriers, Last-Mile
Providers)
1. Computing
2. S
tora
ge
3. Netw
ork
Internet Service Providers
Housing/Hosting Providers
Utility / Cloud Computing Providers
92009/05/04
1. Computing
• State of art: 32nm, Billions transistors, 3-4Ghz/64 bit, 4 - 8 core, 100 watt or less
• User: software• Cost: ~200€/cpu• Limits
– Existing software– Existing
programmers
• Alternative exists– Manycore/GPU
– Multicore hybrid– (Cell PS3/IBM)
Moore Law
102009/05/04
2. Storage
• State of art: 4 Terabyte (HDD), ~100 Gbyte (solid), 200Gbyte (dvdbr)
• Device: computer, tv / audio, photo / cam / embedded• Cost: 0,2-0,4€/Gbyte (hdd)
112009/05/04
3. Network
• State of art: 1gbit (eth), 12mbits(adsl), 10gbits (fiber)• Device: computer, phone Voip, video, ...• Cost: 20€/month (adsl)
• Warning: first limits are evident (ipv4!!)
• Bandwidth shaping• technologies• Network neutrality?
Gilder law
122009/05/04
… exponential growh
• Computing, every 18 months (Gordon Moore, 1964)
• Storage, every 12 months
• Available network bandwidth every 6 month (George Gilder, 1992)
142009/05/04
Some “killer applications” use the Peer to Peer model
• Seti@Home: 5 mln-users, 10^21 Flops• Napster: 40 mln.users, 10 petabytes di files• Skype: 200 mln-users, 7,5 mld minutes/comm
Computing
Sto
rag
e
Network
Mp3 share
Skype
Seti@Home
Pet
abyt
es
mln-PetaFlops
bln-minutes
152009/05/04
Grid Computing – The “utilities” model
• The Model• The Model in the real world
• The Model applied to the Grid
162009/05/04
Utilities model
• Network (infrastructure)mostly public and ruled control
• Resource (account and access) the object to be transferred
• Device (consume) modalities to consume resources
• Plant (production) how resources are made available
172009/05/04
Plan
t
Plan
t
Plan
t
Plan
t
Plan
t
Plan
tInfrastructure
Schema
Prodotti Prodotti Prodotti
Prodotti
Resource
Resource Resource
Resource
Resource ResourceInfrastructure
Resources
Production Plants
Infrastructure
DeviceD
evice
DeviceD
evice
Device
Device
Devices
users
providers
182009/05/04
Examples
• Electricity• Gasoline• Railways• Logistics• ...• ...• Information technology!
192009/05/04
The Power Grid
• Devices (consume)• Infrastructure (distribution)• Resources (transmitted, measured in watt)• Plant (production)
InfrastructureConsume Production
202009/05/04
Grid Computing
• Devices: software• Infrastructure = EU Grid (EGEE), middleware (gLite)• Resources (unit) = #cpu, cpu time, bytes• Plants = server farm (CERN, INFN, CNR, Universities, ...)
InfrastructureConsume Production
212009/05/04
Originated by...
• Universities unable to buy supercomputers• Share commodity and already available
resources• Everyone access to the sum of resources• End of 80's at University of Wisconsin Condor
(www.cs.wisc.edu/condor) is developed• End of 90's Ian Foster create Globus Toolkit
(www.globus.org) and Grid Computing is born
Ian Fosterianfoster.typepad.com
222009/05/04
Architecture comparison
3 reference models:
• Grid Computing (GC)standard platform with shared resources
• High Performance Computing (HPC)centralized and specialized supercomputers
• Peer To Peer Computing (P2P)vertical applications based on Desktop sharing but with very limited reliability
232009/05/04
Currently
• Several definitions for the same thing:– Grid Computing– High Throughput Computing (HTC)– MetaComputing
– Distributed Supercomputing
• Be aware of marketing people...– Sun Grid Engine: it is a scheduler...– Oracle 10G(rid): it is a distributed scalable RDBMS...
242009/05/04
Virtual Organizations (VO)
• Grid is an hybrid modelwith a bit of P2P
• Hardware is shared insideVirtual Organizations
• VO's share resources between them• VO go beyond physical boundaries of nations and real
organizations
Shared resourceVO 1 VO 2
252009/05/04
Open Source role
• gLite IS Open Source, as like as Globus and Condor• Linux is the reference operating system• Applications are mostly Open Source• The cooperation model is mutuated from Open
Source• Virtual Organization model is well supported by a
cooperative Open Source code development model• Grid is plenty of technical services for setup,
maintenance, training... there is room for a traditional Open Source business model
262009/05/04
EU investment in Grid: LHC
• Large Hadron Collider (and black hole generator) was the use case driving the EU investment in Grid
• This hard requirements generated all the currently available tools
• Pro: size of problems can be as large as currently available resources allows
• Cons: requirements from High Energy Physics
Large Hadron Collider
272009/05/04
gLite middleware
• Services and components for a distributed Grid Computing system
Data Services
File & ReplicaCatalog
MetadataCatalog
StorageElement
DataManagement
Security Services
Auditing
Authorization Authentication
Accou
ntin
g
Site P
roxy
Job Management Services
JobProvenance
PackageManager
WorkloadManagement
ComputingElement
Info. & Monitoring Services
Information & Monitoring
JobMonitoring
Access
CLI API
282009/05/04
Grid Computing – EGEE
Currently:240 sites45 countries41,000 CPUs5 PetaBytes> 10,000 users> 150 VOs> 100,000 jobs/day
Discipline d’uso:ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…......
292009/05/04
Production infrastructureRecent level: ~15k CPUs in continuous use
Usage doubled this last year
302009/05/04
Increasing workloads
32%
Still expect factor 5 increase for LHC experiments over next year
30EGEE'07; 2nd October 2007
312009/05/04
Use of the infrastructure
EGEE: ~250 sites, >45000 CPU
24% of the resources are contributed by groups external to the project
~>20k simultaneous jobs
31EGEE'07; 2nd October 2007
322009/05/04
Scientific Disciplines
• Disciplines: 10• Sub-disciplines: 36
11862Total
144Others
3923Life Sciences
119High-Energy Physics
32Fusion
1616Earth Science
276Computational Chemistry
82Astronomy & Astrophysics
2/20076/2006
Condensed Matter PhysicsComp. Fluid DynamicsComputer Science/ToolsCivil ProtectionFinance
332009/05/04
Usage by Scientific Discipline
• Wide (natural) differences in total CPU utilization.• Evidence of broad adoption of grid technology.
342009/05/04
Resources by Discipline
• Utilization depends on having available resources.
• See good coverage ofscientific disciplines for computing and storage resources.– Sites often have more than one
CE or SE defined.
– Number not size of resources!
334366Total306282Infra.327288Unknown149143Others
6542ES2119Fusion8357AA4125CC
123113LS299292HEP
# SEs
# CEs
352009/05/04
Active Virtual Organizations (VO)
• Number of “active” VOs growing steadily– Turnover: Diff. VOs in last 6 / 12 / 24 months = 83 / 92 / 102
– Total VOs: 104 registered, 258 visible
362009/05/04
Summary of Use
• Large, growing overall utilization
• Long-term, habitual use of infrastructure.
• Broad adoption many diverse communities
372009/05/04
Operations progress
Progress/success:• Production service, Oct ’06 to Sep ’07:
– Number of sites: ~190 => ~240 (x1.25 increase)– average number of jobs/month for preceding 12
months: 0.97 million => 2.46 million x2.5 increase)– peak number of jobs in preceding 12 months: 1.45
million (June 06) => 3.11 million (May 07) (x2.14 increase)
– number of CPUs: ~32,000 => ~46,000 (x1.44 increase)
37EGEE'07; 2nd October 2007
392009/05/04
Registered Collaborating Projects
Applicationsimproved services for academia,
industry and the public
Support Actionskey complementary functions
Infrastructuresgeographical or thematic coverage
25 projects have registered as of Sept 2007: web page
422009/05/04
Where are we?
Gartner Group
Grid on the Computing in HighEnergy Physics conferences timeline
2000
2001
2003
2004
20062007
Slide courtesy of Les Robertson, LCG Project Leader
432009/05/04
Towards a European Grid Infrastructure
• A permanent infrastructure• Sustainable• Large international coordination• Public and private use
442008/07/15
Current IT trend
• Grid was intended as the answer to IT needs in terms of distributed infrastructures (processing and data management)… but the term has been overexposed– Software as a Service: The Closest Thing to Grid's Killer
Application? (http://www.gridtoday.com)
– Europe 7th Research Framework Program (2009-2010): from Grid to “Internet of Services and Things”
Grid, High Performance Computing, Clusters, High Availability, Virtualization, AJAX, Web applications, On demand applications, utility computing, IT oursorucing
Cloud computing
Software as a Service
Marketingoperation!
452008/07/15
Use cases
• IBM– Blue Cloud datacenter
• Amazon– Elastic Compute Cloud, Simple Storage Service, Simple DB
• Oracle– Oracle DB and applications aaS, on-demand services
• Google– AppEngine, Docs, BitTable (DB)
• HP– Adaptive Infrastructure aaS
• Much more:– Salesforce and about other 30 “vendors”
462008/07/15
Different problems require different skills
Problem SolutionScalable hardware infrastructure, manage computing peaks
Grid middlwares, virtualization, cloud & utility computing services. Integration of existing commercial and free tools, help in development of datacenter providing these services to support SaaS applications
Provide powerful development tools and offer user interface to end users (SaaS offering)
Abstract layer for developers, hiding different middleware and IT resources complexity. Provide a portal, modern Web 2.0 technologies, integration with libraries and toolkits (e.g. Google). Async Web applications (plenty of AJAX), Dara & metadata management, semantics and ontologies (mash-up)
Computing intensive tasks (e.g. financial real-time data analysis, statistics)
Parallelization and porting to modern architectures, clustering, computing farms. Competences on hardware (parallel architectures, many / multi core solutions) and software (high performance computing, development and porting)
472009/05/04
Real world examples
Problem Case studyScalable hardware infrastructure, manage computing peaks
Goods and people transport planning and routing (on-demand transport, large supermarket chains)
Provide powerful development tools and offer user interface to end users (SaaS offering)
End user Web services for non-tech users (health and non-health biotechnology, pharma, agro & food), data integration tools (control panels for large plant/infrastructure monitoring and management)
Computing intensive tasks (e.g. financial real-time data analysis, statistics)
High performance image processing, financial risk calculation, engineering (CAD/CAM, e.g. naval, aereo, automotive, F1), hydrometeorology (normal and extreme events forecasting)